Re: [slim] Re: Performance measurements ?

2005-10-19 Thread Niek Jongerius
 And as I said in an earlier post, the measurements taken here show the
 _real_ time it takes for the CLI to perform some database query.

 Sorry, your test tool does not quite do what you say... it does not
 measure _the_ time it takes, rather, it measures only _a_ single run
 which by itself *plus* all the perturbing affects of an uncontrolled
 system, is terribly inaccurate.  The numbers reported early demonstrate
 and support this.  A single run of the tool is competing with the rest
 of the processes running on the system - there are dozens or hundreds
 of threads also running and competing.

 Your test tool does not isolate the various affects that perturb
 measurements, so its the benchmarker's job to defeat such affects.
 With the earlier results reported being 2-3x out of agreement, it is
 clear that what is being benchmarked, is not in fact what you believe
 is being measured.  Therefore, drawing any conclusions is not very
 meaningful, or useful.

 Numerous background processes, virus scanners, network activity, disk
 spinup time, low-power to max-power CPU speedup time, swapping, disk
 cache, hardware interrupts, are factors which need to be eliminated and
 reduced before conclusions can be drawn.

All true, but please bear in mind what this tool actually tries to do.
There are quite a few complaints about performance of the server.
Performance in this context is something that is perceived, it is not
a measurement of top speed. When people complain, they probably just
tried to use their SB. During that test, there were all sorts of other
processes running, just as you explained. That very experience of
performance makes them act and send out a call for help.

This tool tries to do exactly that. It is _intended_ to run on a system
that is polluted by all sorts of junk. The measurement would not be
realistic if there wasn't any real life interference by whatever tries
to slow the server down. All we have now is some vague indication of
performance. If someone complains the server stalls when I navigate
to that menu, then click right, and then press play, it could be very
handy to have the queries to the database that correspond to his actions,
and have his server (running all the junk that is messing up the machine)
to spit out a more tangible value than it is sooo slow.

I don't expect the tool to be very accurate in the light of all that is
said, but the bottom line is that if someone wants their toy to play a
piece of music, and it takes say 1 minute to start the play whereas a
normal server should be able to start in about a second, this tool
could give a more accurate indication of what the user experiences. If
the stats are very poor, maybe people could do some digging into what is
making the server so slow. Turn off whatever service they suspect, run a
couple more tests (using the same tool with the same queries on the tuned
server), and if these new tests show a significant and consistent drop in
response time (say, a factor of two or three), then I guess they are on
to something.

These are just ballpark figures (and very probably a huge ballpark at
that), but still the tool could be used to quantify what people see on
their messed-up server. It is no different than the server stats that
the nightlies can spit out. They too have to be scrutinized with care,
and cannot be readily compared to other installs.

Niek.

___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] Re: Performance measurements ?

2005-10-19 Thread kdf
I'm sure what they all really mean to say, Niek, is thank you very much 
for your contribution. :)


-k

___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] Re: Performance measurements ?

2005-10-19 Thread Niek Jongerius
 I'm sure what they all really mean to say, Niek, is thank you very much
 for your contribution. :)

I know. I was just replying you're welcome, grab a cold one and
put your feet up.

Cheers, Niek.

___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-19 Thread Michaelwagner

Yes, I agree with the above. Thanks for the tool. 

It is what it is, it's a measure of what really happened this time.
It's not necessarily an accurate, repeatable measuring instrument
useful for finding and squashing performance bugs - you need a better
(and more calibrateable) test bed for that - but it does measure the
user experience and for that it's useful.

In private email a few days ago with Dean I offered to start some
performance benchmarking of the code (initially I am interested in the
MP3 scanning code) with an eye towards code improvement. 

I can't start now - I'm in the midst of quoting a million things in my
day job, and in my night job I'm helping my girlfriend move her retail
operation into a new, double the size storefront. Doesn't leave much
time for leisure activities :-) But I'll get started in a week or two,
after the store is open and I get a day or 2 off.

Michael


-- 
Michaelwagner
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-19 Thread MrC

I too agree that a Thank You is due for the contribution.  Again, my
comments are not directed at all at the tool, the contribution, or the
author.

And my contributions here are educational, intended for those that do
not understand the issues related to benchmarking, and have
expectations that the numbers received are indicative of slimserver
problems. 

Correct me if I'm wrong - as a post in the General discussion form,
which has an audience with various knowledge levels, it does seem
reasonable to provide such insight as to what causes anomolies, and how
benchmarking and performance evaluation must be controlled to draw
meaningful conclusions.  In essense what I'm saying to those that don't
have this background is: understand before blame.

I'll close again with a thanks to everyone for helping to make such an
outstanding product!


-- 
MrC
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-18 Thread Michaelwagner

The only thing that makes sense to me here is a caching artifact. But it
didn't happen with Mikes other test on his other computer ...


-- 
Michaelwagner
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] Re: Performance measurements ?

2005-10-18 Thread Michael Herger

The only thing that makes sense to me here is a caching artifact. But it
didn't happen with Mikes other test on his other computer ...


I'd confirm it and give a possible explication: that slimserver had been  
idle for about four days before I did that test over a ssh connection. As  
the mail and web servers on that machine turn 24/7 it's pretty probable  
that slimserver had been swapped out.


--

Michael

---
Help translate SlimServer by using the
StringEditor Plugin (http://www.herger.net/slim/)
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] Re: Performance measurements ?

2005-10-18 Thread Marc Sherman

Michaelwagner wrote:



Michael Herger wrote:[color=blue]


[20.891] titles 0 10
[16.235] titles 0 100


Why would 10 titles take more time than 100?


Ramp-up anomalies (due to pre-fetching, caching, lazy code loading, etc) 
are very common in performance testing.  The usual methodology to 
eliminate those effects is to run the entire series of test a few times 
first and throw those results away, before you start recording 
reportable results.


- Marc
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-17 Thread Michaelwagner

This is counter-intuitive ... 
andreas Wrote: 
 Michael Herger wrote:[color=blue]
  [20.891] titles 0 10
  [16.235] titles 0 100
Why would 10 titles take more time than 100?


-- 
Michaelwagner
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-17 Thread MrC

Michaelwagner Wrote: 
 This is counter-intuitive ... 
 
 Why would 10 titles take more time than 100?
I indicated in an earlier post that the testing methodologies being
used here are un-controlled, and the margin of error is too high to
have any meaning.  Without proper controls put into place, and
reduction of all extraneous variables, such tests should be for
amusement only.


-- 
MrC
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] Re: Performance measurements ?

2005-10-17 Thread Niek Jongerius
 Why would 10 titles take more time than 100?

Your guess is as good as (or possibly better than) mine. But there is
no introduced artifact from the test program that I can see (the very
simple source is here: http://media.qwertyboy.org/files/sstime.c).

 I indicated in an earlier post that the testing methodologies being
 used here are un-controlled, and the margin of error is too high to
 have any meaning.  Without proper controls put into place, and
 reduction of all extraneous variables, such tests should be for
 amusement only.

And as I said in an earlier post, the measurements taken here show the
_real_ time it takes for the CLI to perform some database query. Yes,
the numbers that various installs yield are probably hard to compare
amongst each other, but the fact remains that this _is_ the time some
defined query takes to return results using the CLI. If the CLI should
give similar performance to a Slimpy/SB/SB2 when executing queries (and
I'm not knowledgeable to say they do, but I can't see why not), then
these numbers give a good indication of how long our beloved hardware
has to wait for the server to response.

Again, this is _NOT_ meant to show how the server performs in an ideal,
controlled environment, this is a down-to-Earth measurement of real life
installs. Can someone tell me how these performance measurements differ
conceptually from what the graphs show that Triode made that are in the
nightlies? Are they also for amusement only? They too give some idea
of a real install, and are not meant for an ideal, controlled environment.

Now if we could come up with a set of CLI commands that give a good
representation of what a real scenario would fire at the database, we
would have an objective indication of performance instead of vague
statements like it is too slow or whatever. _That_ is what I'm trying
to get to.

Unless I am totally off base here...

Niek.

___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-15 Thread Niek Jongerius

I have placed the programs and the source for the Linux version on a
regular page on my site. The site itself is under active development,
so please bear with me when I have screwed things up again. See:

http://media.qwertyboy.org/mono/niek.aspx?Page=SqueezeBox


-- 
Niek Jongerius
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] Re: Performance measurements ?

2005-10-14 Thread Niek Jongerius
 Second pass:
 [55.391] titles 0 1

 Third pass:
 [66.390] titles 0 1

 With a 20% variance, we can see that the testing methodology and
 environment is very rough.

 And, with Niek running a P4/2.8 getting

 [24.201] titles 0 1

 and Bill running a P4/3 getting an average of

 [60.114] titles 0 1

 There's an almost 2.5x difference in times running on hardware where
 hardware specs alone would account roughly for only 10% difference.

 Hopefully nobody will look at these data points and attempt to draw
 unwarranted conclusions.

Agreed. There are too many variables in the way machines are set up to
readily compare output numbers. CPU and RAM are by no means the only
variables here. OS, procs running, procs priority, intermediate network
(if test prog is run over a network) etc.

But bottom line still is that it takes that reported amount of time for
the SlimServer to cough up the data requested (assuming the CLI uses
comparable ways in getting the data). If Bill is getting 2.5 times worse
performance in the same tests as I get, I would assume his setup performs
about that factor worse than mine when serving a SqueezeBox. The proggy
does nothing fancy (I'll post the source in a minute on my site), it
just times the start and end of the CLI command.

I have not been very inventive in the queries I posed in my sample input
file. It could be that my example commands are somehow not representable
for gauging performance. Someone with a better understanding of what
actually are reasonable queries could maybe give a few. It's just a matter
of editing the input file to test other CLI commands.

Niek.

___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] Re: Performance measurements ?

2005-10-14 Thread Niek Jongerius

 Yeah, you really need to be careful about methodology here.

 If you want to select typical things and get typical response times,
 you probably need to carefully think out the typical things people do
 at the user interface and mimic them in the CLI and run them on many
 different configurations.

 I doubt many people list the top thousand songs when they're at the
 remote.

 If you want to benchmark the code to do before and after studies of
 code improvements, you need to have one typical machine, benchmark it
 accurately AND THEN FREEZE IT AND DON'T CHANGE IT. That almost means
 dedicate it to the benchmarking task and making it a reference system,
 because you never know when installing Microsoft Office 2007 (heaven
 help us) or IE 17.2 won't change the way I/O works or how much
 background activity there is cluttering up the disk.

Note that I did not intend this to be a benchmark tool. In a lot of posts
on this list people said the performance of their install was insert your
favourite speed indication here, which was a very subjective indication.
This program simply times a request to the CLI. It should give some idea
about what a user would see if using a real SqueezeBox (assuming we use a
reasonable set of CLI queries, which I probably don't).

There are even some of us desparately switching OSes and tweaking stuff
on the same machine and discussing whether ActiveState is faster than
compiled Windows or CygWin or whatever. This tool then is able to give
_some_ numbers. Again, you cannot compare them 1 on 1 to other installs,
but IMHO if one install does someting in 20 seconds, and another one in
just 2, there is going to be the same user experience when connecting and
using a SqueezeBox.

Niek.

___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-14 Thread Michaelwagner

I can probably pull together an equivalent routine for windows systems
that will work across all current windows platforms. But I won't be
able to get to it until next week. Helping the spouse move her place of
work this weekend.

It's a good idea, what this routine does. I didn't mean to be
disparaging it in a previous post. It's just that we must realize what
it does (and more importantly, does not) test.

But I think the idea would make an excellent test bed for regular
performance regression testing. That is, once a week or so, download
the latest nightlies, run a standardized script of enquiries against a
standardized configuration (static set of music files not otherwise
used), and see if performance changes on any of the enquiries. If it
improves, great. If it takes a sudden nosedive, that's an early warning
that something in that code path needs attention (or perhaps it's a
known thing, because it's supporting new function). Anyways, it's a
warning system.

Michael


-- 
Michaelwagner
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-13 Thread MrC

Some food for though...

 
 Second pass:
 [55.391] titles 0 1
 
 Third pass:
 [66.390] titles 0 1
 
With a 20% variance, we can see that the testing methodology and
environment is very rough.

And, with Niek running a P4/2.8 getting

[24.201] titles 0 1

and Bill running a P4/3 getting an average of 

[60.114] titles 0 1

There's an almost 2.5x difference in times running on hardware where
hardware specs alone would account roughly for only 10% difference.

Hopefully nobody will look at these data points and attempt to draw
unwarranted conclusions.


-- 
MrC
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] Re: Performance measurements ?

2005-10-13 Thread Michaelwagner

Yeah, you really need to be careful about methodology here.

If you want to select typical things and get typical response times,
you probably need to carefully think out the typical things people do
at the user interface and mimic them in the CLI and run them on many
different configurations. 

I doubt many people list the top thousand songs when they're at the
remote.

If you want to benchmark the code to do before and after studies of
code improvements, you need to have one typical machine, benchmark it
accurately AND THEN FREEZE IT AND DON'T CHANGE IT. That almost means
dedicate it to the benchmarking task and making it a reference system,
because you never know when installing Microsoft Office 2007 (heaven
help us) or IE 17.2 won't change the way I/O works or how much
background activity there is cluttering up the disk.


-- 
Michaelwagner
___
Discuss mailing list
Discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss