John Arbash Meinel wrote: >Matthew Schumacher wrote: > > > >>All it's doing is trying the update before the insert to get around the >>problem of not knowing which is needed. With only 2-3 of the queries >>implemented I'm already back to running about the same speed as the >>original SA proc that is going to ship with SA 3.1.0. >> >>All of the queries are using indexes so at this point I'm pretty >>convinced that the biggest problem is the sheer number of queries >>required to run this proc 200 times for each email (once for each token). >> >>I don't see anything that could be done to make this much faster on the >>postgres end, it's looking like the solution is going to involve cutting >>down the number of queries some how. >> >>One thing that is still very puzzling to me is why this runs so much >>slower when I put the data.sql in a transaction. Obviously transactions >>are acting different when you call a proc a zillion times vs an insert >>query. >> >> >> >> >Well, I played with adding a COMMIT;BEGIN; statement to your exact test >every 1000 lines. And this is what I got: > > Just for reference, I also tested this on my old server, which is a dual Celeron 450 with 256M ram. FC4 and Postgres 8.0.3 Unmodified: real 54m15.557s user 0m24.328s sys 0m14.200s
With Transactions every 1000 selects, and vacuum every 5000: real 8m36.528s user 0m16.585s sys 0m12.569s With Transactions every 1000 selects, and vacuum every 10000: real 7m50.748s user 0m16.183s sys 0m12.489s On this machine vacuum is more expensive, since it doesn't have as much ram. Anyway, on this machine, I see approx 7x improvement. Which I think is probably going to satisfy your spamassassin needs. John =:-> PS> Looking forward to having a spamassassin that can utilize my favorite db. Right now, I'm not using a db backend because it wasn't worth setting up mysql. >Unmodified: >real 17m53.587s >user 0m6.204s >sys 0m3.556s > >With BEGIN/COMMIT: >real 1m53.466s >user 0m5.203s >sys 0m3.211s > >So I see the potential for improvement almost 10 fold by switching to >transactions. I played with the perl script (and re-implemented it in >python), and for the same data as the perl script, using COPY instead of >INSERT INTO means 5s instead of 33s. > >I also played around with adding VACUUM ANALYZE every 10 COMMITS, which >brings the speed to: > >real 1m41.258s >user 0m5.394s >sys 0m3.212s > >And doing VACUUM ANALYZE every 5 COMMITS makes it: >real 1m46.403s >user 0m5.597s >sys 0m3.244s > >I'm assuming the slowdown is because of the extra time spent vacuuming. >Overall performance might still be improving, since you wouldn't >actually be inserting all 100k rows at once. > > ... >This is all run on Ubuntu, with postgres 7.4.7, and a completely >unchanged postgresql.conf. (But the machine is a dual P4 2.4GHz, with >3GB of RAM). > >John >=:-> > >
signature.asc
Description: OpenPGP digital signature