Re: Review Request 50017: WIP: Validated the resources when parsing it.

Guangya Liu Sun, 17 Jul 2016 08:17:24 -0700


> On 七月 14, 2016, 5:56 p.m., Benjamin Mahler wrote:
> > include/mesos/resources.hpp, lines 135-147
> > <https://reviews.apache.org/r/50017/diff/1/?file=1443931#file1443931line135>
> >
> >     This suggests a change of semantics to `Resources`.
> >     
> >     Currently: `Resources` always refers to a valid collection resources. 
> > This function would never return Some(Error) unless we change the semantics.
> >     
> >     If we want to remove the unnecessary validation, we can do this by 
> > making '`operator += (const Resources& that)`' avoid validating '`that`' 
> > since it's already valid. Currently it just calls '`operator += (const 
> > Resource& r)`' which is unaware of '`r`' already being valid since it comes 
> > from a '`Resources`'.
> >     
> >     However, we have to keep the validation in '`operator += (const 
> > Resource& r)`', since '`r`' is just a protobuf coming from an arbitrary 
> > caller, it may be invalid and we need to validate.
> >     
> >     Make sense?
> 
> Guangya Liu wrote:
>     Hi Ben, does there are any example in mesos code for `Resource` object 
> come from protobuf? I found that we are always using `Resource::parse` to get 
> a `Resource` or `Resources` object and I have added some validation logic in 
> `Resource::parse` in when getting `Resource` or `Resources` object.
> 
> Guangya Liu wrote:
>     Ben, got your point. We can set `Resource` object as following:
>     
>     ```javascript 
>     Resource scalar;
>     scalar.set_name("cpus");
>     scalar.set_type(Value::SCALAR);
>     scalar.mutable_scalar()->set_value(1.234);
>     ```
>     
>     So if we igore the validation for `'operator += (const Resource& r)'`, 
> there might be some erros if the resource was not constructed correctly. This 
> is not a good way to create `Resource` object but we cannot guard the usage 
> for this now.
>     
>     I found that the API `'operator += (const Resource& r)'` was not called 
> directly too frequently, but we are using this API `'operator += (const 
> Resources& that)'` for the most of the time, so seems it is good to keep the 
> validation for `'operator += (const Resource& r)'` as it was not called 
> directly too much. 
>     
>     The problem is that `'operator += (const Resources& that)'` is calling 
> `'operator += (const Resource& r)'`, I did not found a good way to keep 
> validation for `'operator += (const Resource& r)'` but ignore the validation 
> for `'operator += (const Resources& that)'`, still checking how we can make 
> this work, any comments/suggestions for this?
> 
> Guangya Liu wrote:
>     Did more test by removing validation, the performance of resources 
> operations with `port` resources increased more than 5x.
>     
>     With validation
>     ```
>     [==========] Running 1 test from 1 test case.
>     [----------] Global test environment set-up.
>     [----------] 1 test from ResourcesOperators/Resources_BENCHMARK_Test
>     [ RUN      ] ResourcesOperators/Resources_BENCHMARK_Test.Arithmetic/2
>     Took 11.212199secs to perform 1000 'total += r' operations on 
> ports(*):[1-2, 4-5, 7-8, 10-11, 13-14, 16-17, 1...
>     Took 8.145903secs to perform 1000 'total -= r' operations on 
> ports(*):[1-2, 4-5, 7-8, 10-11, 13-14, 16-17, 1...
>     Took 11.154993secs to perform 1000 'total = total + r' operations on 
> ports(*):[1-2, 4-5, 7-8, 10-11, 13-14, 16-17, 1...
>     Took 7.974004secs to perform 1000 'total = total - r' operations on 
> ports(*):[1-2, 4-5, 7-8, 10-11, 13-14, 16-17, 1...
>     [       OK ] ResourcesOperators/Resources_BENCHMARK_Test.Arithmetic/2 
> (38490 ms)
>     [----------] 1 test from ResourcesOperators/Resources_BENCHMARK_Test 
> (38490 ms total)
>     
>     [----------] Global test environment tear-down
>     [==========] 1 test from 1 test case ran. (38512 ms total)
>     [  PASSED  ] 1 test.
>     ```
>     
>     Without validation
>     ```
>     [==========] Running 1 test from 1 test case.
>     [----------] Global test environment set-up.
>     [----------] 1 test from ResourcesOperators/Resources_BENCHMARK_Test
>     [ RUN      ] ResourcesOperators/Resources_BENCHMARK_Test.Arithmetic/2
>     Took 2.90743secs to perform 1000 'total += r' operations on 
> ports(*):[1-2, 4-5, 7-8, 10-11, 13-14, 16-17, 1...
>     Took 9166us to perform 1000 'total -= r' operations on ports(*):[1-2, 
> 4-5, 7-8, 10-11, 13-14, 16-17, 1...
>     Took 3.038083secs to perform 1000 'total = total + r' operations on 
> ports(*):[1-2, 4-5, 7-8, 10-11, 13-14, 16-17, 1...
>     Took 9253us to perform 1000 'total = total - r' operations on 
> ports(*):[1-2, 4-5, 7-8, 10-11, 13-14, 16-17, 1...
>     [       OK ] ResourcesOperators/Resources_BENCHMARK_Test.Arithmetic/2 
> (5966 ms)
>     [----------] 1 test from ResourcesOperators/Resources_BENCHMARK_Test 
> (5966 ms total)
>     
>     [----------] Global test environment tear-down
>     [==========] 1 test from 1 test case ran. (5988 ms total)
>     [  PASSED  ] 1 test.
>     ```
>     
>     Also after more checking, I think that we can remove the validation from 
> resource operations, as we already have resource validation when launching 
> tasks, creating resurces etc. Even though application developer may create 
> resources using protobuf directly, but if the resource definition is not 
> right, the validation for task resource will be failed.


Sorter test, the sorter add allocation increased 10x. It is strange that the 
benchmark test for allocation test in hierarchical test does not improve much, 
still checking...

With validation:
```
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from AgentAndClientCount/Sorter_BENCHMARK_Test
[ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/35
Using 50000 agents and 1000 clients
Added 1000 clients in 19572us
Added 50000 agents in 915932us
Added allocations for 50000 agents in 40.303108secs
Full sort of 1000 clients took 32702us
No-op sort of 1000 clients took 314us
[       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/35 (42449 ms)
[----------] 1 test from AgentAndClientCount/Sorter_BENCHMARK_Test (42449 ms 
total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (42466 ms total)
[  PASSED  ] 1 test.
```

Without validation:
```
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from AgentAndClientCount/Sorter_BENCHMARK_Test
[ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/35
Using 50000 agents and 1000 clients
Added 1000 clients in 22764us
Added 50000 agents in 862959us
Added allocations for 50000 agents in 3.934831secs
Full sort of 1000 clients took 30187us
No-op sort of 1000 clients took 322us
[       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/35 (6031 ms)
[----------] 1 test from AgentAndClientCount/Sorter_BENCHMARK_Test (6031 ms 
total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6049 ms total)
[  PASSED  ] 1 test.
```


- Guangya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50017/#review142257
-----------------------------------------------------------


On 七月 14, 2016, 3:59 p.m., Guangya Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50017/
> -----------------------------------------------------------
> 
> (Updated 七月 14, 2016, 3:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Klaus Ma.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The "validation" API was called in a huge number of times, but this
> was only needed when parsing resources and we do not need to do
> other validation when doing resources operations, such as add etc.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp a557e97c65194d4aad879fb88d8edefd1c95b8d8 
>   include/mesos/v1/resources.hpp a5ba8fec4c9c3643646308f75a4b28cefe0b3df3 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/50017/diff/
> 
> 
> Testing
> -------
> 
> The `qcachegrind` result here: 
> https://docs.google.com/document/d/1oilen04e8trIOgYbj-jxAd7esTRDaySll8y2BQ6p5nU/edit?usp=sharing
> 
> After the fix, the performance of `adding` resources increased 30+% when 
> adding 50000 agents to the cluster.
> 
> Before fix:
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/30
> Using 50000 agents and 1 clients
> Added 1 clients in 47us
> **Added 50000 agents in 1.312497secs**
> Sorted 1 clients in 43us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/30 (1321 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/31
> Using 50000 agents and 50 clients
> Added 50 clients in 948us
> **Added 50000 agents in 1.325987secs**
> Sorted 50 clients in 1165us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/31 (1340 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/32
> Using 50000 agents and 100 clients
> Added 100 clients in 1697us
> **Added 50000 agents in 1.409478secs**
> Sorted 100 clients in 2876us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/32 (1432 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/33
> Using 50000 agents and 200 clients
> Added 200 clients in 4553us
> **Added 50000 agents in 1.371473secs**
> Sorted 200 clients in 5371us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/33 (1412 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/34
> Using 50000 agents and 500 clients
> Added 500 clients in 8836us
> **Added 50000 agents in 1.304245secs**
> Sorted 500 clients in 14697us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/34 (1387 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/35
> Using 50000 agents and 1000 clients
> Added 1000 clients in 19508us
> **Added 50000 agents in 1.270555secs**
> Sorted 1000 clients in 32575us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/35 (1433 ms)
> 
> 
> After the fix:
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/30
> Using 50000 agents and 1 clients
> Added 1 clients in 42us
> **Added 50000 agents in 891266us**
> Sorted 1 clients in 59us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/30 (902 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/31
> Using 50000 agents and 50 clients
> Added 50 clients in 933us
> **Added 50000 agents in 885006us**
> Sorted 50 clients in 1220us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/31 (899 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/32
> Using 50000 agents and 100 clients
> Added 100 clients in 1879us
> **Added 50000 agents in 903112us**
> Sorted 100 clients in 2800us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/32 (922 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/33
> Using 50000 agents and 200 clients
> Added 200 clients in 3893us
> **Added 50000 agents in 881240us**
> Sorted 200 clients in 5802us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/33 (912 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/34
> Using 50000 agents and 500 clients
> Added 500 clients in 10712us
> **Added 50000 agents in 877442us**
> Sorted 500 clients in 17887us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/34 (949 ms)
> [ RUN      ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/35
> Using 50000 agents and 1000 clients
> Added 1000 clients in 21472us
> **Added 50000 agents in 916653us**
> Sorted 1000 clients in 37369us
> [       OK ] AgentAndClientCount/Sorter_BENCHMARK_Test.FullSort/35 (1057 ms)
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>

Re: Review Request 50017: WIP: Validated the resources when parsing it.

Reply via email to