Re: Review Request 39060: Create master detector per url & not per framework.

2016-01-05 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

(Updated Jan. 5, 2016, 9:01 a.m.)


Review request for mesos and Joris Van Remoortere.


Summary (updated)
-

Create master detector per url & not per framework.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs
-

  include/mesos/scheduler.hpp 049c041286f3167e79cc5ea8a9e0bf8d42569832 
  src/sched/sched.cpp 44eb4f50e8ed84297268d94a3a0320c843ff6d8c 
  src/tests/mesos.hpp a4811b0d8dee33ff2ca4968f532ce64b7ea95249 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Re: Review Request 39060: Create master detector per url & not per framework

2016-01-05 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

(Updated Jan. 5, 2016, 8:48 a.m.)


Review request for mesos and Joris Van Remoortere.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs (updated)
-

  include/mesos/scheduler.hpp 049c041286f3167e79cc5ea8a9e0bf8d42569832 
  src/sched/sched.cpp 44eb4f50e8ed84297268d94a3a0320c843ff6d8c 
  src/tests/mesos.hpp a4811b0d8dee33ff2ca4968f532ce64b7ea95249 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Re: Review Request 39060: Create master detector per url & not per framework

2016-01-04 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

(Updated Jan. 5, 2016, 3:17 a.m.)


Review request for mesos and Joris Van Remoortere.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs (updated)
-

  include/mesos/scheduler.hpp 049c041286f3167e79cc5ea8a9e0bf8d42569832 
  src/sched/sched.cpp 44eb4f50e8ed84297268d94a3a0320c843ff6d8c 
  src/tests/mesos.hpp a4811b0d8dee33ff2ca4968f532ce64b7ea95249 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Re: Review Request 40730: Fixing MESOS-3552 by using CHECK_NEAR to avoid errors due to double precision errors.

2015-11-26 Thread Mandeep Chadha


> On Nov. 26, 2015, 3:14 p.m., Neil Conway wrote:
> > src/common/resources.cpp, line 951
> > <https://reviews.apache.org/r/40730/diff/1/?file=1147236#file1147236line951>
> >
> > This definitely needs an explanatory comment.
> 
> Bernd Mathiske wrote:
> I'll propose a patch that addresses these issues.

What if results or cpus returns isNone or one of them is isNone ( Option ) , 
then get() is going to crash. CHECK_NEAR() does the same thing as the previous 
review request and this is the history of previous review request :


"CHECK_DOUBLE_EQ will also fail. 

F0930 18:15:38.169140 26984 resources.cpp:874] Check failed: 
(result.cpus().get()) >= (cpus().get())-0.001L (24 vs. 24)  Check 
failure stack trace:  F0930 18:15:38.169322 26991 resources.cpp:874] Check 
failed: (result.cpus().get()) >= (cpus().get())-0.001L (24 vs. 24)  
Check failure stack trace:

CHECK_NEAR() is the right Macro to use. 


But we also need to check (isNone() and isNone()) equality and hence the 
present implementation. I think the long term fix is to wrap double into Double 
and have the opeartor== to the right thing.

" 

Using CHECK_NEAR() is fine but not checking other state of Option<> may lead to 
crash and is not complete.


- Mandeep


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40730/#review108140
---


On Nov. 26, 2015, 6:52 a.m., Avinash sridharan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40730/
> ---
> 
> (Updated Nov. 26, 2015, 6:52 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Klaus Ma, and Neil Conway.
> 
> 
> Bugs: MESOS-3552
> https://issues.apache.org/jira/browse/MESOS-3552
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> A version of this fix was already proposed by Mandeep Chadha (@mchadha) and 
> submitted to the reivew board https://reviews.apache.org/r/39056/ . So the 
> main credit should go to him for this fix.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp b4abf5405039d7d0a5028ccf034ad2e9623d064c 
> 
> Diff: https://reviews.apache.org/r/40730/diff/
> 
> 
> Testing
> ---
> 
> Added Mandeep's test case to GTEST https://reviews.apache.org/r/39056/ and 
> enabled an existing precision test (https://reviews.apache.org/r/40732/) to 
> test this fix.
> 
> 
> Thanks,
> 
> Avinash sridharan
> 
>



Re: Review Request 39056: Fix for Mesos master crash due to check failure.

2015-11-03 Thread Mandeep Chadha


> On Oct. 7, 2015, 1:03 a.m., Guangya Liu wrote:
> > src/common/resources.cpp, line 879
> > <https://reviews.apache.org/r/39056/diff/3/?file=1092072#file1092072line879>
> >
> > The meos is now using 0.01 as the MIN_CPUS
> 
> Guangya Liu wrote:
> As the mesos is using 0.01 as the MIN_CPUS, I think it is OK using 0.01 
> as the epsilon

Are we ok with changes ? Friendly ping :)


- Mandeep


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39056/#review101729
-------


On Oct. 6, 2015, 10:07 p.m., Mandeep Chadha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39056/
> ---
> 
> (Updated Oct. 6, 2015, 10:07 p.m.)
> 
> 
> Review request for mesos and Neil Conway.
> 
> 
> Bugs: MESOS-3552
> https://issues.apache.org/jira/browse/MESOS-3552
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Check failed due to double comparison : MESOS-3552.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp 601388c35a1bff37c58e753d1870d53b8d0af2d1 
>   src/tests/reservation_tests.cpp 6b7c43c8b5c64618249dbee926383242320c111e 
>   src/v1/resources.cpp dc868903472f8f3a1ddc56092e3f8f81d953ce39 
> 
> Diff: https://reviews.apache.org/r/39056/diff/
> 
> 
> Testing
> ---
> 
> Added unit test.
> make check successful.
> 
> 
> Thanks,
> 
> Mandeep Chadha
> 
>



Re: Review Request 39060: Create master detector per url & not per framework

2015-11-03 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

(Updated Nov. 3, 2015, 9:32 a.m.)


Review request for mesos and Joris Van Remoortere.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs (updated)
-

  include/mesos/scheduler.hpp f571d42d1508632152473c4f4ade60ae3900fce1 
  src/sched/sched.cpp 9c5e3b8e42605f4647d897ba02ea3a17e494f355 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Re: Review Request 39060: Create master detector per url & not per framework

2015-11-03 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

(Updated Nov. 3, 2015, 6:06 p.m.)


Review request for mesos and Joris Van Remoortere.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs (updated)
-

  include/mesos/scheduler.hpp f571d42d1508632152473c4f4ade60ae3900fce1 
  src/sched/sched.cpp 9c5e3b8e42605f4647d897ba02ea3a17e494f355 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Re: Review Request 39060: Create master detector per url & not per framework

2015-10-21 Thread Mandeep Chadha


> On Oct. 21, 2015, 12:41 a.m., Joris Van Remoortere wrote:
> > Hey Mandeep, Thanks for taking on this important work!
> > Some high level questions:
> > - Do we need to do reference counting manually or can we use a construct 
> > like `Shared` in Libprocess that wraps the `std::shared_ptr`.
> > - Do we want to surface the singleton pattern in this way? What are the 
> > impacts of this on the user (i.e. your logic calling the `acquire` / 
> > `release` code currently? Does the user care about holding a reference to 
> > the singleton?
> > 
> > Regardless, I added some style comments as they are valuable for you as a 
> > contributor.

Thanks Joris for the comments. Let me get that done and upload the new diff.


- Mandeep


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/#review103331
-------


On Oct. 8, 2015, 5:05 p.m., Mandeep Chadha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39060/
> ---
> 
> (Updated Oct. 8, 2015, 5:05 p.m.)
> 
> 
> Review request for mesos and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-3595
> https://issues.apache.org/jira/browse/MESOS-3595
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If the number of framework created exceeds the lib process
> threads then during master failover the zookeeper updates can
> cause deadlock.
> 
> 
> Diffs
> -
> 
>   src/sched/sched.cpp 571e00d303009a940f17c8ed4582749a718e846d 
> 
> Diff: https://reviews.apache.org/r/39060/diff/
> 
> 
> Testing
> ---
> 
> make check successful 
> Created 100 framework instances on a 24 CPU machine. Master failover detected 
> by the framework process and continue to work as expected.
> 
> 
> Thanks,
> 
> Mandeep Chadha
> 
>



Re: Review Request 39060: Create master detector per url & not per framework

2015-10-08 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

(Updated Oct. 8, 2015, 5:05 p.m.)


Review request for mesos and Joris Van Remoortere.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs (updated)
-

  src/sched/sched.cpp 571e00d303009a940f17c8ed4582749a718e846d 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Re: Review Request 39060: Create master detector per url & not per framework

2015-10-07 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

(Updated Oct. 7, 2015, 9:42 p.m.)


Review request for mesos and Joris Van Remoortere.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs (updated)
-

  src/sched/sched.cpp 571e00d303009a940f17c8ed4582749a718e846d 
  src/slave/containerizer/provisioner/docker/registry_client.hpp 
fdb68b675582f603ffb3e96f31c1c9405e238567 
  src/slave/containerizer/provisioner/docker/registry_client.cpp 
4931ae8869a697b1e9d8d4cbc0a871e7cd506285 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Re: Review Request 39060: Create master detector per url & not per framework

2015-10-07 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

(Updated Oct. 7, 2015, 9:49 p.m.)


Review request for mesos and Joris Van Remoortere.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs (updated)
-

  src/sched/sched.cpp 571e00d303009a940f17c8ed4582749a718e846d 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Review Request 39056: Fix for Mesos master crash due to check failure.

2015-10-06 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39056/
---

Review request for mesos.


Bugs: MESOS-3552
https://issues.apache.org/jira/browse/MESOS-3552


Repository: mesos


Description
---

Check failed due to double comparison : MESOS-3552.


Diffs
-

  src/common/resources.cpp 601388c35a1bff37c58e753d1870d53b8d0af2d1 
  src/tests/reservation_tests.cpp 6b7c43c8b5c64618249dbee926383242320c111e 

Diff: https://reviews.apache.org/r/39056/diff/


Testing
---

Added unit test.
make check successful.


Thanks,

Mandeep Chadha



Review Request 39060: Create master detector per url & not per framework

2015-10-06 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39060/
---

Review request for mesos.


Bugs: MESOS-3595
https://issues.apache.org/jira/browse/MESOS-3595


Repository: mesos


Description
---

If the number of framework created exceeds the lib process
threads then during master failover the zookeeper updates can
cause deadlock.


Diffs
-

  src/sched/sched.cpp 571e00d303009a940f17c8ed4582749a718e846d 

Diff: https://reviews.apache.org/r/39060/diff/


Testing
---

make check successful 
Created 100 framework instances on a 24 CPU machine. Master failover detected 
by the framework process and continue to work as expected.


Thanks,

Mandeep Chadha



Re: Review Request 39056: Fix for Mesos master crash due to check failure.

2015-10-06 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39056/
---

(Updated Oct. 6, 2015, 9:33 p.m.)


Review request for mesos and Neil Conway.


Bugs: MESOS-3552
https://issues.apache.org/jira/browse/MESOS-3552


Repository: mesos


Description
---

Check failed due to double comparison : MESOS-3552.


Diffs (updated)
-

  src/common/resources.cpp 601388c35a1bff37c58e753d1870d53b8d0af2d1 
  src/tests/reservation_tests.cpp 6b7c43c8b5c64618249dbee926383242320c111e 
  src/v1/resources.cpp dc868903472f8f3a1ddc56092e3f8f81d953ce39 

Diff: https://reviews.apache.org/r/39056/diff/


Testing
---

Added unit test.
make check successful.


Thanks,

Mandeep Chadha



Re: Review Request 39056: Fix for Mesos master crash due to check failure.

2015-10-06 Thread Mandeep Chadha


> On Oct. 6, 2015, 6:34 p.m., Neil Conway wrote:
> > src/common/resources.cpp, line 874
> > <https://reviews.apache.org/r/39056/diff/1/?file=1091951#file1091951line874>
> >
> > This change should also be applied to Resources::apply() in 
> > src/v1/resources.cpp

Thanks Neil.


- Mandeep


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39056/#review101674
-------


On Oct. 6, 2015, 9:33 p.m., Mandeep Chadha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39056/
> ---
> 
> (Updated Oct. 6, 2015, 9:33 p.m.)
> 
> 
> Review request for mesos and Neil Conway.
> 
> 
> Bugs: MESOS-3552
> https://issues.apache.org/jira/browse/MESOS-3552
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Check failed due to double comparison : MESOS-3552.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp 601388c35a1bff37c58e753d1870d53b8d0af2d1 
>   src/tests/reservation_tests.cpp 6b7c43c8b5c64618249dbee926383242320c111e 
>   src/v1/resources.cpp dc868903472f8f3a1ddc56092e3f8f81d953ce39 
> 
> Diff: https://reviews.apache.org/r/39056/diff/
> 
> 
> Testing
> ---
> 
> Added unit test.
> make check successful.
> 
> 
> Thanks,
> 
> Mandeep Chadha
> 
>



Re: Review Request 39056: Fix for Mesos master crash due to check failure.

2015-10-06 Thread Mandeep Chadha


> On Oct. 6, 2015, 9:47 p.m., Jie Yu wrote:
> > src/common/resources.cpp, line 880
> > <https://reviews.apache.org/r/39056/diff/2/?file=1092068#file1092068line880>
> >
> > Can you introduce a `CHECK_DOUBLE_EQ` in `stout/check.hpp`, similar to
> > 
> > https://code.google.com/p/googletest/wiki/AdvancedGuide#Floating-Point_Macros
> > 
> > To choose a default epsilon, you probably want to check the above link. 
> > Gtest uses "Units in the Last Place (ULPs)" as the default.
> 
> Mandeep Chadha wrote:
> CHECK_DOUBLE_EQ will also fail. 
> 
> F0930 18:15:38.169140 26984 resources.cpp:874] Check failed: 
> (result.cpus().get()) >= (cpus().get())-0.001L (24 vs. 24) *** 
> Check failure stack trace: *** F0930 18:15:38.169322 26991 resources.cpp:874] 
> Check failed: (result.cpus().get()) >= (cpus().get())-0.001L (24 
> vs. 24) *** Check failure stack trace: ***
> 
> CHECK_NEAR() is the right Macro to use. 
> 
> 
> But we also need to check (isNone() and isNone()) equality and hence the 
> present implementation. I think the long term fix is to wrap double into 
> Double and have the opeartor== to the right thing.
> 
> Jie Yu wrote:
> How do you implemente CHECK_DOUBLE_EQ? You copied from gtest code?

you can just call it:

CHECK_DOUBLE_EQ(result.cpus.get(), cpus.get());

src/common/resources.cpp ( includes the glog/logging.h and all the Macros are 
available).

#include 


- Mandeep


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39056/#review101691
---


On Oct. 6, 2015, 10:07 p.m., Mandeep Chadha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39056/
> ---
> 
> (Updated Oct. 6, 2015, 10:07 p.m.)
> 
> 
> Review request for mesos and Neil Conway.
> 
> 
> Bugs: MESOS-3552
> https://issues.apache.org/jira/browse/MESOS-3552
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Check failed due to double comparison : MESOS-3552.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp 601388c35a1bff37c58e753d1870d53b8d0af2d1 
>   src/tests/reservation_tests.cpp 6b7c43c8b5c64618249dbee926383242320c111e 
>   src/v1/resources.cpp dc868903472f8f3a1ddc56092e3f8f81d953ce39 
> 
> Diff: https://reviews.apache.org/r/39056/diff/
> 
> 
> Testing
> ---
> 
> Added unit test.
> make check successful.
> 
> 
> Thanks,
> 
> Mandeep Chadha
> 
>



Re: Review Request 39056: Fix for Mesos master crash due to check failure.

2015-10-06 Thread Mandeep Chadha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39056/
---

(Updated Oct. 6, 2015, 10:07 p.m.)


Review request for mesos and Neil Conway.


Bugs: MESOS-3552
https://issues.apache.org/jira/browse/MESOS-3552


Repository: mesos


Description
---

Check failed due to double comparison : MESOS-3552.


Diffs (updated)
-

  src/common/resources.cpp 601388c35a1bff37c58e753d1870d53b8d0af2d1 
  src/tests/reservation_tests.cpp 6b7c43c8b5c64618249dbee926383242320c111e 
  src/v1/resources.cpp dc868903472f8f3a1ddc56092e3f8f81d953ce39 

Diff: https://reviews.apache.org/r/39056/diff/


Testing
---

Added unit test.
make check successful.


Thanks,

Mandeep Chadha