Re: [Slony1-general] Soliciting ideas for v2.0

Christopher Browne Tue, 03 Jul 2007 14:52:44 -0700

Jeff Davis wrote:

On Tue, 2007-07-03 at 12:33 -0400, Christopher Browne wrote:

I'll see about doing an experiment on this to see if, for the DELETE
case, it seems to actually help.  It may be that the performance
effects are small to none, so that the added code complication isn't
worthwhile.


In a simple test I ran, DELETE of the entire 5M record table using
sequential scan was MUCH faster (9.41s) than 5M individual DELETE
statements in a single transaction (552.49s).

5M records is small enough to fit into memory. I expect the difference
would be even greater when the index and table can't both fit into
memory and the deletes are distributed randomly over the table.

I think it is worth exploring ways of solving this problem. Right now
slony is great for small inserts, updates, and deletes. But any large
update/delete on the origin can cause the subscribers to fall way
behind.

The "handy" alternative test (which would be a good "smoke test" forwhether it's worth bothering to put *any* effort into this) would be totry to do some partial groupings to see if they'd help.

Thus, if the whole delete goes across the range id = 0 thru id =5000000, then things to try would be (each case involving 1 transaction):

1. Delete with the 5M individual DELETE statements. (Which you foundtook 552.49s)2. Delete with 50K DELETE statements, each having a WHERE clause with100 items in it.3. Delete with 5K DELETE statements, each having a WHERE clause with 1Kitems in it.

If 2. or 3. come *way* closer to 9.41s, then it may be worth exploringthe complexity of folding together adjacent deletes on the same table.There could also be a case made for trying sequential versus randomorderings (e.g. - in the former case, each DELETE statement takes on aspecific range of items whereas in the latter, each selects items moreor less at random).

I'll see about constructing a series of tests like this; won't berunning before I send this :-). If you have time to generate 2. and/or3., on your system and get timings there, I'd be much obliged.


;;;; Here's some relevant code :-)
(format t "begin;")
(loop for i from 0 to 49999
 do (format t "delete from foo where ")
 (loop for j from 0 to 99
      do (format t "id=~D or " (+ j (* i 100))))
 do (format t "id=~D;~%" (* i 100)))
(format t "commit;")

I don't think we can save the full 543 seconds, but if we could save agood portion of it, it's worth trying to pursue...

Remember, the thought we started with was:
   "What if we could do something that would make mass operations less
    expensive?"

I don't want to introduce anything that can materially increase
processing costs.

The more intelligent we try to get, the more expensive the
logtrigger() function gets, and if the price is high enough, then we
gain nothing.

The only "win" I see is if we can opportunistically join some
statements together.  If we have to make the log trigger function
universally *WAY* more expensive, well, that's a performance loss :-(.


Is there any way that we could detect (even without 100% confidence)
that a transaction is "big" and we should spend more effort trying to
optimize it?

Regrettably, no. For us to switch over to a sort of log trigger thatsupports "doing something smarter" requires that we add in logic thatwill have some (definitely non-zero) cost any time it *isn't*worthwhile. And "usual sorts of OLTP activity" will fall into thecategory where performance would be injured.

I know that's a big project (in the general case), but there might be
some simple things that would work.

Well, the cases I suggested (2. and 3.) would fall into "simple cases".
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Re: [Slony1-general] Soliciting ideas for v2.0

Reply via email to