Re: Establishing baseline metrics

Sergio Boso Wed, 03 Jul 2013 07:35:34 -0700

Il 01/07/2013 22:32, Robin D. Wilson ha scritto:

I'm thinking I look at performance testing differently than a lot of people... 
For me, the objective of performance testing is to...



.. me too! I have an entirely different point of view about base line.

I agree with Robin, user acceptance is hard to define: When I ask, in general, about user response, customers tend to give genericaanswer, like "all page must be loaded in 3 secs....etc".

IN reality, user acceptance has very much to do with the value of the thing users are doing: in one case, they were doing documentuploading and retrieval,

and they found perfectly normal to wait for more than 20 s to get the documents 
they were looking for.
Previously, it took more than one hour to get the paper version in a dusty 
archive, so 20 s was a perfect time.

So, I'm doing a baseline to get the "best possible behaviour" for the script: usually it is run with a single user, with relaxedthink time, when very little load (or no load at all) is running on the system.

Often I take many samples, (i.e. many loops), in order  to understand 
variability.

This way, you can go back to the management and say: "hey, your software is running with this response time and this operation isdone in X ms. Are your user OK with that?"

If the answer is OK, you can start the load test looking at how response time 
grows with load...

Surely, you also have to make calculation about the _*target load *_you want to bear. According to my experience, the customer hasno interest into knowing

the maximum load for the system: they want to know if the system can bear the 
business they are running.

SO, I start with increase the load (i.e. the thread, i.,e. the virtual users), 
and look at the way response time increase

establish what your system _can_ do, not what you need to accomplish. So when 
you are setting up your tests, you are trying to drive
your systems at maximum capacity for some extended period of time. Then you 
measure that capacity as your 'baseline'.

For every subsequent release of your code, you measure it against the 
'baseline', and determine whether the code got faster or
slower. If you determine that the slower (or faster) response is acceptable to 
your end users (because you were nowhere near the
user's acceptable standard), you can reset your baseline to that standard. If 
your slower standard is encroaching on the usability
of the system - you can declare that baseline as the minimum spec, and then 
fail any code that exceeds that standard.

As for how you determine what is acceptable to a 'user', that can be handled in 
a number of ways - without actually improving the
'real' performance of the system. Consider a web page that loads a bunch of 
rows of data in a big table. For most users, if you can
start reading the table within 1-2 seconds, that is acceptable for a system's 
performance. But if there are hundreds of rows of
data, you would not need to load _all_ the rows within 1-2 seconds to actually 
meet their performance criteria. You only need to
load enough rows that the table fills the browser - so they can start reading - 
within the 1-2 second period. JMeter cannot really
measure this timing, it can only measure the 'overall response time' (indeed, I 
don't know any testing tool that can do it). So
trying to define a performance benchmark in terms of what 'users' experience is 
really difficult, and nearly useless (to me anyway).

I look at performance testing as a way to cross-check my development team 
against the perpetual tendency to gum-up the code and slow
things down. So in order to make the testing effective for the developers, I 
need to perf test _very_specific_ things. Trying to
performance test the "system" as a whole is nearly an impossible task - not 
only because there are so many variables that influence
the tests, but precisely because "all of those variables" make it impossible to 
debug which one causes the bottleneck when there is
a change in performance from one release to the next. (Have you ever sent your 
programmers off to 'fix' a performance problem that
turned out to be caused by an O/S update on your server? I have...)

Instead, we create performance tests that test specific functional systems. That is, the 
"login" perf test. The "registration" perf
test. The "..." perf test. Each one of these tests is run independently, so 
that when we encounter a slower benchmark - we can tell
the developers immediately where to concentrate their efforts in fixing the 
problem. (We also monitor all parts of the system (CPU,
IO, Database Transactions (reads, writes, full table scans, etc.) from all 
servers involved in the test. The goal is not to simulate
'real user activity', it is to max out the capacity of at least 1 of the 
servers in the test (specifically the one executing the
'application logic'). If we max out that one server, we know that our 
'benchmark' is the most we can expect of a single member of
our cluster of machines. (We also test a cluster of 2 machines - and measure 
the fall-off in capacity between a 1-member cluster and
2-member cluster, this gives us an idea of how much impact our 'clustering' 
system has on performance as well.) I suppose you could
say that I look at it as if, we measure the 'maximum capacity', and so long as 
the number of users doesn't exceed that - we will
perform OK.

We do run some 'all-encompassing' system tests as well, but those are more for 
'stress' testing than for performance benchmarking.
We are specifically looking for things that start to break-down after hours of 
continuous operation at peak capacity. So we monitor
error logs and look to make sure that we aren't throwing errors while under 
stress.

The number one thing to keep in mind about performance testing is that you have 
to use 'real data'. We actually download our
production database every weekend, and strip out any 'personal information' 
(stuff that we protect in our production environment) by
either nulling it out, or replacing it with bogus data. This allows us to run 
our performance tests against a database that has 100s
of millions of rows of data. Nearly all of our performance 'bugs' have been 
caused by poor data handling in the code (SQL requests
that don't use indices (causing a full table scan), badly formed joins, 
fetching a few rows of data and then looping through them in
the code (when the 'few rows of data' from your 'dev' environment become 
100,000 rows with the production data, this tends to bog
the code down a lot), etc.). So if you are testing with 'faked' data, odds are 
good you will miss a lot of performance issues - no
matter what form of performance testing you use.

I will say that we have served over 130M web pages in 1 month, using only 5 
servers (4 tomcats, and 1 DB server)... Those pages
represented about 10X in "GET" requests to our servers...

--
Robin D. Wilson
Sr. Director of Web Development
KingsIsle Entertainment, Inc.
http://www.kingsisle.com



--

Ing. Sergio Boso




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@jmeter.apache.org
For additional commands, e-mail: user-h...@jmeter.apache.org

Re: Establishing baseline metrics

Reply via email to