Re: Coordinated Omission (CO) - possible strategies

Kirk Pepperdine Fri, 18 Oct 2013 09:55:41 -0700

Hi Gil,

I would have to disagree as in this case I believe there is CO due to the 
threading model, CO on a per-thread basis as well as plain old omission. I 
believe these conditions are in addition to the conditions you're pointing to.


You may test at a fixed rate for HFT but in most worlds, random is necessary. 
Unfortunately that makes the problem more difficult to deal with.

Regards,
Kirk

On 2013-10-18, at 5:32 PM, Gil Tene <g...@azulsystems.com> wrote:

> I don't think the thread model is the core of the Coordinated Omission 
> problem. Unless we consider the only solution to be sending no more than one 
> request per 20 minutes from any given thread a threading model fix. It's more 
> of a configuration choice the way I see it, but a pretty impossible one. The 
> thread model may need work for other reasons, but CO is not one of them. 
> 
> In JMeter, as with all other synchronous testers, Coordinated Omission is a 
> per-thread issue. It's easy to demonstrate CO with JMeter with a single 
> client thread testing an application that has only a single client connection 
> in the real world, or with 15 client threads testing an application that has 
> exactly 15 real-world clients communicating at high rates (common with muxed 
> environments, messaging, ESBs, trading systems, etc.). No amount of threading 
> or concurrency will help get a better test results capturing for these very 
> real system. Any occurrence of CO will make the JMeter results seriously 
> bogus.
> 
> When any one thread misses a planned request sending time, CO has already 
> occurred, and there is no way to avoid it at that point. You certainly detect 
> that CO has happened. The question is what to do about it in JMeter once you 
> detect it. The major options are:
> 
> 1. Ignore it and keep working with the data as if it actually meant anything. 
> This amount to http://tinyurl.com/o46doqf .
> 
> 2. You can try to change the tester behavior to avoid CO going forward. E.g. 
> you can try to adjust the number of threads up AND at the same time the 
> frequency of requests that each thread sends requests at, which will amount 
> to drastically changing the test plan in reaction to system behavior. In my 
> opinion, changing behavior dynamically will have very limited effectiveness 
> for two reasons: The first is that the problem had already occurred, so all 
> the data up to and including the observed CO  is already bogus and has to be 
> thrown away unless it can be corrected somehow. Only after you auto-adjust 
> enough times to not see CO for a long time, your results during that time may 
> be valid. The second is that changing the test scenario is valid (and 
> possible) for very few real world systems.
> 
> 3. You can try to correct for CO when you observe it. There are various ways 
> this can be done, and most of them will amount to re-creating missing test 
> sample results by projecting from past results. This can help correct the 
> results data set so that it would better approximate what a tester that was 
> not synchronous, and would have kept issuing requests per the actual test 
> plan, would have experienced in the test.
> 
> 4. Something else we hadn't yet thought about.
> 
> Some correction and detection example work can be found at: 
> https://github.com/OutlierCorrector/jmeter/commit/34c34cae673fd0871a423035a9f262d049f3d9e9
>  , which uses code at https://github.com/OutlierCorrector/OutlierCorrector . 
> Michael Chmiel worked at Azul Systems over the summer on this problem, and 
> the OutlierCorrector package and the small patch to JMeter  (under the 
> docs-2.9 branch) are some of the results of that work. This fix approach 
> appears to work well as long as no explicitly random behavior is stated in 
> the test scenarios (the outlier detector detects a test pattern and repeats 
> it in repairing the data. Expressly random scenarios will not exhibit a 
> detectable pattern.).
> 
> -- Gil.
> 
> On Oct 17, 2013, at 11:47 PM, Kirk Pepperdine <kirk.pepperd...@gmail.com>
>  wrote:
> 
>> Hi Sebb,
>> 
>> In my testing, the option off creating threads on demand instead of all at 
>> once has made a huge difference in my being able to control rate of arrivals 
>> on the server. It has convinced me that simply using the throughput 
>> controller isn't enough and that the threading model in JMeter *must* 
>> change. It is the threading model that is the biggest source of CO in 
>> JMeter. Unfortunately we weren't able to come to some way of a 
>> non-disruptive change in JMeter to make this happen.
>> 
>> The model I was proposing would have JMeter generate an event heap sorted by 
>> the time when a sampler should be fired. A thread pool should be used to eat 
>> off of the heap and fire the events as per scheduled. This would allow 
>> JMeter to break the inappropriate relationship of a thread being a user. The 
>> solution is not perfect in that you will still have to fight with thread 
>> schedulers and hypervisors to get things to happen on queue. However, I 
>> believe the end result will be a far more scalable product that will require 
>> far fewer threads to produce far higher loads on the server.
>> 
>> As for your idea on the using the throughput controller. IHMO triggering an 
>> assert only worsens the CO problem. In fact, if the response times from the 
>> timeouts are not added into the results, in other words they are omitted 
>> from the data set, you've only made the problem worse as you are filter out 
>> bad data points from the result sets making the results better than they 
>> should be. Peter Lawyer's (included here for the purpose of this discussion) 
>> technique for correcting CO is to simply recognize when the event should 
>> have been triggered and then start the timer for that event at that time. So 
>> the latency reported will include the time before event triggering.
>> 
>> Gil Tene's done some work with JMeter. I'll leave it up to him to post what 
>> he's done. The interesting bit that he's created is HrdHistogram 
>> (https://github.com/giltene/HdrHistogram). It is not only a better way to 
>> report results,it offers techniques to calculate and correct for CO. Also 
>> Gil might be able to point you to a more recent version of his on CO talk. 
>> It might be nice to have a new sampler that incorporates this work.
>> 
>> On a side note, I've got a Servlet filter that is JMX component that 
>> measures a bunch of stats from the servers POV. It's something that could be 
>> contributed as it could be used to help understand the source of CO.. if not 
>> just complement JMeter's view of latency.
>> 
>> Regards,
>> Kirk
>> 
>> 
>> On 2013-10-18, at 12:27 AM, sebb <seb...@gmail.com> wrote:
>> 
>>> It looks to be quite difficult to avoid the issue of Coordination
>>> Omission without a major redesign of JMeter.
>>> 
>>> However, it may be a lot easier to detect when the condition has occurred.
>>> This would potentially allow the test settings to be changed to reduce
>>> or eliminate the occurrences - e.g. by increasing the number of
>>> threads or spreading the load across more JMeter instances.
>>> 
>>> The Constant Throughput Controller calculates the desired wait time,
>>> and if this is less than zero - i.e. a sample should already have been
>>> generated - it could trigger the creation of a failed Assertion
>>> showing the time difference.
>>> 
>>> Would this be sufficient to detect all CO occurrences?
>>> If not, what other metric needs to be checked?
>>> 
>>> Even if it is not the only possible cause, would it be useful as a
>>> starting point?
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@jmeter.apache.org
>>> For additional commands, e-mail: user-h...@jmeter.apache.org
>>> 
>> 
>

Re: Coordinated Omission (CO) - possible strategies

Reply via email to