Jeff Davis wrote:
On Tue, 2007-07-03 at 12:33 -0400, Christopher Browne wrote:
I'll see about doing an experiment on this to see if, for the DELETE
case, it seems to actually help. It may be that the performance
effects are small to none, so that the added code complication isn't
worthwhile.
In a simple test I ran, DELETE of the entire 5M record table using
sequential scan was MUCH faster (9.41s) than 5M individual DELETE
statements in a single transaction (552.49s).
5M records is small enough to fit into memory. I expect the difference
would be even greater when the index and table can't both fit into
memory and the deletes are distributed randomly over the table.
I think it is worth exploring ways of solving this problem. Right now
slony is great for small inserts, updates, and deletes. But any large
update/delete on the origin can cause the subscribers to fall way
behind.
The "handy" alternative test (which would be a good "smoke test" for
whether it's worth bothering to put *any* effort into this) would be to
try to do some partial groupings to see if they'd help.
Thus, if the whole delete goes across the range id = 0 thru id =
5000000, then things to try would be (each case involving 1 transaction):
1. Delete with the 5M individual DELETE statements. (Which you found
took 552.49s)
2. Delete with 50K DELETE statements, each having a WHERE clause with
100 items in it.
3. Delete with 5K DELETE statements, each having a WHERE clause with 1K
items in it.
If 2. or 3. come *way* closer to 9.41s, then it may be worth exploring
the complexity of folding together adjacent deletes on the same table.
There could also be a case made for trying sequential versus random
orderings (e.g. - in the former case, each DELETE statement takes on a
specific range of items whereas in the latter, each selects items more
or less at random).
I'll see about constructing a series of tests like this; won't be
running before I send this :-). If you have time to generate 2. and/or
3., on your system and get timings there, I'd be much obliged.
;;;; Here's some relevant code :-)
(format t "begin;")
(loop for i from 0 to 49999
do (format t "delete from foo where ")
(loop for j from 0 to 99
do (format t "id=~D or " (+ j (* i 100))))
do (format t "id=~D;~%" (* i 100)))
(format t "commit;")
I don't think we can save the full 543 seconds, but if we could save a
good portion of it, it's worth trying to pursue...
Remember, the thought we started with was:
"What if we could do something that would make mass operations less
expensive?"
I don't want to introduce anything that can materially increase
processing costs.
The more intelligent we try to get, the more expensive the
logtrigger() function gets, and if the price is high enough, then we
gain nothing.
The only "win" I see is if we can opportunistically join some
statements together. If we have to make the log trigger function
universally *WAY* more expensive, well, that's a performance loss :-(.
Is there any way that we could detect (even without 100% confidence)
that a transaction is "big" and we should spend more effort trying to
optimize it?
Regrettably, no. For us to switch over to a sort of log trigger that
supports "doing something smarter" requires that we add in logic that
will have some (definitely non-zero) cost any time it *isn't*
worthwhile. And "usual sorts of OLTP activity" will fall into the
category where performance would be injured.
I know that's a big project (in the general case), but there might be
some simple things that would work.
Well, the cases I suggested (2. and 3.) would fall into "simple cases".
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general