On 6 November 2012 13:38, Gunnar "Nick" Bluth <gunnar.bl...@pro-open.de>wrote:
> Am 06.11.2012 18:38, schrieb Petr Praus: > > > Yes, but note that this happens only in Linux. Increasing work_mem on my > iMac increases performance (but the queries are slower under OSX than on > virtualized Ubuntu on the same machine). Over the weekend, I tried the same > test on my Ubuntu home machine with Ivy Bridge i5 3570K and it also slows > down (from ~900ms with work_mem=1MB to ~1200ms with work_mem=96MB). > > > OS X is rather different from a memory access point of view, IIRC. So the > direct comparison actually only shows how well the Linux FS cache works > (for the temp files created with small work_mem ;-). > > The i5 puzzles me a bit though... > > > >> I'm pretty sure you're hitting some subtle, memory-access-related >> cornercase here. >> >> The L2 cache of your X7350 CPUs is 2MB, could you run the tests with, >> say, 1, 2, 4 and 8MB of work_mem and post the results? >> > I made a pgbench test with the same query and run it 25 times (5 clients, > 5 transactions each): > work_mem speed > 1MB 1794ms > 2MB 1877ms > 4MB 2084ms > 8MB 2141ms > 10MB 2124ms > 12MB 3018ms > 16MB 3004ms > 32MB 2999ms > 64MB 3015ms > > It seems that there is some sort of "plateau". > > Two, afaics. The 1->2 change hints towards occasionally breaching your L2 > cache, so it can probably be ignored. The actual plateaus thus seem to be > 0-2, 2-12, >= 12. > It'd be interesting to see the EXPLAIN ANALYSE outputs for these levels, > the buckets and batches in particular. I'd reckon we'll see significant > changes at 2->4 and 10->12MB work_mem. > Here are the explains, I run the query a few times before actually taking the explain to warm up the caches. (I also noticed that explain slows down the query execution which is probably to be expected.) 2MB: http://explain.depesz.com/s/ul1 4MB: http://explain.depesz.com/s/IlVu 10MB: http://explain.depesz.com/s/afx3 12MB: http://explain.depesz.com/s/i0vQ So, to sum this up (and make someone more competent bite on it maybe ;-), > on your SMP, FSB, "fake-multicore" system all "hash"-related works that > potentially switch to different implementations internally (but w/out > telling us so) when given more work_mem are slower. > > See other post... it actually does tell us (# of buckets/batches). > However, the result is not good and could potentially be improved be > twealing the statistic_targets of the joined tables/columns. > > I wonder why noone actually understanding the implementation chipped in > yet... Andres, Greg, Tom, whoever actually understands what's happening > here, anyone reading this? ;-) > > Cheers, > > -- > Gunnar "Nick" Bluth > RHCE/SCLA > > Mobil +49 172 8853339 > Email: gunnar.bl...@pro-open.de > __________________________________________________________________________ > In 1984 mainstream users were choosing VMS over UNIX. Ten years later > they are choosing Windows over UNIX. What part of that message aren't you > getting? - Tom Payne > >