Re: Coordinated Omission (CO) - possible strategies

Kirk Pepperdine Sun, 20 Oct 2013 10:50:55 -0700

On 2013-10-19, at 9:56 AM, Gil Tene <[email protected]> wrote:

> To focus on the "how to deal with Coordinated Omission" part:
> 
> There are two main ways to deal with CO in your actual executed behavior:
> 
> 1. Change the behavior to avoid CO to begin with. 
> 
> 2. Detect it and correct it.


I'll add detect and report. I believe there is value beyond you can't believe 
the data. It's telling you that there is a condition that you need to eliminate 
from your test.
> 
> There is a "detect it and report it" one too, but I dot think it is of any 
> real use, as detection without correction will just tell you your data can't 
> be believed at all, but won't tell you anything about what can be. Since CO 
> can move percentile magnitudes and position by literal multiple orders if 
> magnitude (I have multiple measured real world production behaviors that show 
> this) , "hoping it us not too bad" when you know it is there amounts to 
> burying your head in the sand.
> 
> So Kirk, is the random behavior you need one if random timing, or random 
> operation sequencing (or both)?

I need operations to occur at a random internal. That said, the interval is 
"random" to the server and *not* to JMeter. JMeter can pre-calculate when 
certain events should occur and then detect when it misses that target. The 
easiest way to do this is to build an event (sampler??) queue that understands 
when things such as the next HTTP sampler should be fired.

Regards,
Kirk
 
> 
> Sent from my iPad
> 
> On Oct 18, 2013, at 10:48 PM, "Kirk Pepperdine" <[email protected]> 
> wrote:
> 
>> 
>> On 2013-10-19, at 1:33 AM, Gil Tene <[email protected]> wrote:
>> 
>>> I guess we look at human response back pressure in different ways. It's a 
>>> question of whether or not you consider the humans to be part of the system 
>>> you are testing, and what you think your stats are supposed to represent.
>> 
>> You've seen my presentations and so you know that I do believe that human 
>> and non-human actors are definitively part of the system. They provide the 
>> dynamics for the system being tested. A change in how that layer in my model 
>> works can and does makes a huge difference in how the other layers work to 
>> support the overall system.
>>> 
>>> Some people will take the "forgiving" approach, which considers the client 
>>> behavior client as part of the overall system behavior. In such an 
>>> approach, if a human responded to slow behavior by not asking any more 
>>> questions for a while, that's simply what the overall system did, and the 
>>> stats reported should reflect only the actual attempts that actual humans 
>>> would have, including their slowing down their requests in response to slow 
>>> reaction times. 
>> 
>> Sort of. I want to know that a user was inhibited from making forward 
>> progress because the previous step in their workflow blew stated tolerances. 
>> In some cases I'd like to have that user abandon. I'm not sure I'd call this 
>> forgiving though I am looking to see what the overall system can do to 
>> answer the question; is it good enough and if not, why not.
>> 
>> I'm not going to suggest your view is incorrect. I think it's quite valid. I 
>> don't believe the two views are orthogonal and that there are elements of 
>> both in each. The question here on more practical terms is; what needs to be 
>> done to reduce the level of CO that currently occurs in JMeter and how 
>> should we react to it. Throwing out entire datasets from runs seems like an 
>> academic answer to a more practical question; will our application stand up 
>> when under load. From my point of view, for JMeter to better answer that 
>> question. 
>> 
>>> 
>>> A web site being completely down for 5 minutes an hour would generate a lot 
>>> of human back pressure response. It may even slow down request rates so 
>>> much during the outage that 99%+ of the overall actual requests by end 
>>> users during an hour that included such a 5 minute outage would still be 
>>> very good. Reporting on those (actual requests by humans) would be very 
>>> different from reporting on what would have happened without human back 
>>> pressure. But it's easy to examine which of the two reporting methods would 
>>> be accepted by a reader of such reports.
>> 
>> But then that 5 minute outage is going to show up some where and if you bury 
>> it in how you report.... that would seem to be a problem. This whole 
>> argument suggests that what you want is a better regime for the treatment of 
>> the data. If that is what you're saying, we're in complete agreement. The 5 
>> minute pause should not be filtered out of the data!
>> 
>> IMHO, the first thing to do is eliminate or reduce the known sources of CO 
>> from JMeter. I'm not sure that tackling the CTT is the beat way to go. In 
>> fact I'd prefer a combination of approaches that includes things like how 
>> jHiccup works with a GC STW detector. As you've mentioned before, even with 
>> a fix to the threading model in JMeter, CO will still occur.
>> 
>> Regards,
>> Kirk
>>

Re: Coordinated Omission (CO) - possible strategies

Reply via email to