On Tue, 27 Mar 2007, Magnus Hagander wrote:

Would not at least some of these numbers be better presented through the
stats collector, so they can be easily monitored?
That goes along the line of my way way way away from finished attempt
earlier, perhaps a combination of these two patches?

When I saw your patch recently, I thought to myself "hmmm, the data collected here sure looks familiar"--you even made some of the exact same code changes I did. I've been bogged down recently chasing a performance issue that, come to find, was mainly caused by the "high CPU usage for stats collector" bug. That caused the background writer to slow to a crawl under heavy load, which is why I was having all these checkpoint and writer issues that got me monitoring that code in the first place.

With that seemingly resolved, slightly new plan now. Next I want to take the data I've been collecting in my patch, bundle the most important parts of that into messages sent to the stats writer the way it was suggested you rewrite your patch, then submit the result. I got log files down and have a real good idea what data should be collected, but as this would be my first time adding stats I'd certainly love some help with that.

Once that monitoring infrastructure is in place, I then planned to merge Itagati's "Load distributed checkpoint" patch (it touches a lot of the same code) and test that out under heavy load. I think it gives a much better context to evaluate that patch in if rather than measuring just its gross results, you can say something like "with the patch in place the average fsync time on my system dropped from 3 seconds to 1.2 seconds when writing out more than 100MB at checkpoint time". That's the direct cause of the biggest problem in that area of code, so why not stare right at it rather than measuring it indirectly.

* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to