Re: [PATCHES] hash index improving v3
On Tue, 2008-09-23 at 00:48 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I wasn't very happy with effective_cache_size and not happy with > > shared_buffers either. If building hash indexes is memory critical then > > we just need to say so and encourage others to set memory use correctly. > > People are already aware that maintenance_work_mem needs to be increased > > for large index builds and we will confuse people if we ignore that and > > use another parameter instead. > > I think you've got this completely backwards. The general theory about > maintenance_work_mem is "set it as large as you can stand it". The > issue at hand here is that the crossover point for hash index sort > building seems to be a good deal less than all-the-memory-you-have. There's an optimal point for btree builds using sorts that is a good deal less also, so I don't get that. Plus, if you give it all-the-memory-you-have there won't be anything left for anything else. You might set it that high for an empty database load but you don't set it that high in production unless you want to see swapping when you create large indexes. maintenance_work_mem is already used for 3 separate operations that bear little resemblance to each other. If it's appropriate for all of those then its appropriate for this usage also. Doing it otherwise is going to mean more than 50% of people do the wrong thing, which would be a shame because what's been done looks very cool. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Simon Riggs <[EMAIL PROTECTED]> writes: > I wasn't very happy with effective_cache_size and not happy with > shared_buffers either. If building hash indexes is memory critical then > we just need to say so and encourage others to set memory use correctly. > People are already aware that maintenance_work_mem needs to be increased > for large index builds and we will confuse people if we ignore that and > use another parameter instead. I think you've got this completely backwards. The general theory about maintenance_work_mem is "set it as large as you can stand it". The issue at hand here is that the crossover point for hash index sort building seems to be a good deal less than all-the-memory-you-have. Perhaps there is a case for giving this behavior its very own configuration parameter; but seeing that we still don't have all that much of a use case for hash indexes at all, I don't feel a need to do that yet. In any case, tying it to maintenance_work_mem is certainly wrong. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
On Tue, 2008-09-23 at 00:05 -0400, Tom Lane wrote: > "Jonah H. Harris" <[EMAIL PROTECTED]> writes: > > On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > >> On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > >>> I'm considering changing hashbuild to switch over at shared_buffers > >>> instead > >>> of effective_cache_size --- any thoughts about that? > >> > >> Switching to shared_buffers gets my vote, on my test table with > >> 50,000,000 rows it takes about 8 minutes to create an index using the > >> default effective_cache_size. With effective_cache_size set to 6GB > >> (machine has 8GB) its still going an hour later. > > > Agreed. I think using shared_buffers as a cutoff is a much better idea as > > well. > > Already done in CVS a week or so back, but thanks for following up with > some confirmation. I think nobody noticed that change, or the discussion. I wasn't very happy with effective_cache_size and not happy with shared_buffers either. If building hash indexes is memory critical then we just need to say so and encourage others to set memory use correctly. People are already aware that maintenance_work_mem needs to be increased for large index builds and we will confuse people if we ignore that and use another parameter instead. At the very least, a controllable parameter is the only appropriate choice for production systems, not one that cannot be changed without restart. It would also throw away any chance of having pg_restore of hash indexes run in parallel, since it will not be able to control memory usage. Setting maintenance_work_mem higher than shared buffers is easily possible now and we should just use that rather than try to prejudge how people will and can configure their systems. If maintenance_work_mem default is too low, lets increase it. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
"Jonah H. Harris" <[EMAIL PROTECTED]> writes: > On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: >> On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >>> I'm considering changing hashbuild to switch over at shared_buffers instead >>> of effective_cache_size --- any thoughts about that? >> >> Switching to shared_buffers gets my vote, on my test table with >> 50,000,000 rows it takes about 8 minutes to create an index using the >> default effective_cache_size. With effective_cache_size set to 6GB >> (machine has 8GB) its still going an hour later. > Agreed. I think using shared_buffers as a cutoff is a much better idea as > well. Already done in CVS a week or so back, but thanks for following up with some confirmation. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
On Mon, Sep 22, 2008 at 7:57 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > 50,000,000 rows and 32,768,000 collisions I should mention thats 50 million rows + 32 million more or 62,768,000 rows which explains some of the added index creation time... > index time: > head: 576600.967 ms > v5: 487949.490 ms -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> BTW, one thing I noticed was that the hash index build time for the >> "wide column" case got a lot worse after applying the patch (from 56 to >> 237 sec). The reason for this turned out to be that with the smaller >> predicted index size, the code decided not to use the pre-sorting method >> that was recently added. Reducing effective_cache_size to less than the >> index size brought the time back down, to about 54 sec. So it would >> seem that effective_cache_size is too large a cutoff value. I'm >> considering changing hashbuild to switch over at shared_buffers instead >> of effective_cache_size --- any thoughts about that? > > Switching to shared_buffers gets my vote, on my test table with > 50,000,000 rows it takes about 8 minutes to create an index using the > default effective_cache_size. With effective_cache_size set to 6GB > (machine has 8GB) its still going an hour later. Agreed. I think using shared_buffers as a cutoff is a much better idea as well. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > BTW, one thing I noticed was that the hash index build time for the > "wide column" case got a lot worse after applying the patch (from 56 to > 237 sec). The reason for this turned out to be that with the smaller > predicted index size, the code decided not to use the pre-sorting method > that was recently added. Reducing effective_cache_size to less than the > index size brought the time back down, to about 54 sec. So it would > seem that effective_cache_size is too large a cutoff value. I'm > considering changing hashbuild to switch over at shared_buffers instead > of effective_cache_size --- any thoughts about that? Switching to shared_buffers gets my vote, on my test table with 50,000,000 rows it takes about 8 minutes to create an index using the default effective_cache_size. With effective_cache_size set to 6GB (machine has 8GB) its still going an hour later. -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
On Fri, Sep 12, 2008 at 8:29 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote: >> On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: >> > Alex, >> > >> > I meant to check the performance with increasing numbers of collisions, >> > not increasing size of the hashed item. In other words, something like >> > this: >> > >> > for ($coll=500; $i<=100; $i=$i*2) { >> > for ($i=0; $i<=100; $i++) { >> >hash(int8 $i); >> > } >> > # add the appropriate number of collisions, distributed evenly to >> > # minimize the packing overrun problem >> > for ($dup=0; $dup<=$coll; $dup++) { >> >hash(int8 MAX_INT + $dup * 100/$coll); >> > } >> > } >> > >> > Ken >> >> *doh* right something like this... >> >> create or replace function create_test_hash() returns bool as $$ >> declare >> coll integer default 500; >> -- tweak this to where create index gets really slow >> max_coll integer default 100; >> begin >> loop >> execute 'create table test_hash_'|| coll ||'(num int8);'; >> execute 'insert into test_hash_'|| coll ||' (num) select n >> from generate_series(0, '|| max_coll ||') as n;'; >> execute 'insert into test_hash_'|| coll ||' (num) select >> (n+4294967296) * '|| max_col ||'/'|| coll ||'::int from >> generate_series(0, '|| coll ||') as n;'; >> >> coll := coll * 2; >> >> exit when coll >= max_coll; >> end loop; >> return true; >> end; >> $$ language 'plpgsql'; >> >> And then benchmark each table, and for extra credit cluster the table >> on the index and benchmark that. >> >> Also obviously with the hashint8 which just ignores the top 32 bits. >> >> Right? >> > Yes, that is exactly right. > > Ken Ok I finally found time to do this, In summary looks like v5 scales about the same as cvs head when the collisions are spread evenly (obviously not HEAD with the hash patch applied...). I couldn't test cluster because we can't cluster on hash indexes... benchmark with 50,000,000 rows and 500 collisions: index creation time: head: 326116.647 ms v5: 269509.026 ms pgbench -n -c1 -T600 -f q.sql hash head: tps = 3226.605611 v5: tps = 3412.64 (excluding connections establishing) 50,000,000 rows and 32,768,000 collisions index time: head: 576600.967 ms v5: 487949.490 ms pgbench -n -c1 -T500 -f q.sql hash head: tps = 3105.270185 v5: tps = 3382.25782 You can see each result from 500 all the way up to 32,768,000 collision in the attached result.out Attached files: create_test_hash.sql: function I used to create the test tables result.out: output from bench.pl which shows the pgbench results and the create index times bench.pl: stupid little perl script to test pgbench each of the created tables from create_test_hash.pl bench.pl Description: Perl program create_test_hash.sql Description: Binary data result.out Description: Binary data -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote: > >> Do we really need a checkpoint there at all? > > > "Timelines only change at shutdown checkpoints". > > Hmm. I *think* that that is just a debugging crosscheck rather than a > critical property. But yeah, it would take some close investigation, > which maybe isn't warranted if you have a less-invasive solution. OK, new patch, version 6. Some major differences to previous patch. * new IsRecoveryProcessingMode() in shmem * padding in XLogCtl to ensure above call is cheap * specific part of bgwriter shmem for passing restartpoint data * avoid Shutdown checkpoint at end of recovery, with carefully considered positioning of statements (beware!) * only one new postmaster mode, PM_RECOVERY * bgwriter changes state without stopping/starting Modes I have tested so far * make check * Start, Stop * Crash Recovery * Archive Recovery * Archive Recovery, switch in middle of restartpoint Modes not yet tested * EXEC_BACKEND Ready for serious review prior to commit. I will be performing further testing also. backend/access/transam/multixact.c |2 backend/access/transam/xlog.c | 328 --- backend/postmaster/bgwriter.c | 371 +---! backend/postmaster/postmaster.c| 62 !! backend/storage/buffer/README |5 backend/storage/buffer/bufmgr.c| 34 +!! include/access/xlog.h | 14 ! include/access/xlog_internal.h |3 include/catalog/pg_control.h |2 include/postmaster/bgwriter.h |2 include/storage/bufmgr.h |2 include/storage/pmsignal.h |1 12 files changed, 279 insertions(+), 56 deletions(-), 491 mods(!) There's a few subtle points along the way. I've tried to explain them all in code comments, but questions welcome. At v6, most things are now done a particular way for a specific reason. Look especially at InRecovery, which is used extensively in different parts of the code. The meaning of this has been subdivided into two meanings, so only *some* of the places that use it have been changed. All have been checked. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support Index: src/backend/access/transam/multixact.c === RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/multixact.c,v retrieving revision 1.28 diff -c -r1.28 multixact.c *** src/backend/access/transam/multixact.c 1 Aug 2008 13:16:08 - 1.28 --- src/backend/access/transam/multixact.c 22 Sep 2008 19:28:56 - *** *** 1543,1549 * SimpleLruTruncate would get confused. It seems best not to risk * removing any data during recovery anyway, so don't truncate. */ ! if (!InRecovery) TruncateMultiXact(); TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true); --- 1543,1549 * SimpleLruTruncate would get confused. It seems best not to risk * removing any data during recovery anyway, so don't truncate. */ ! if (!IsRecoveryProcessingMode()) TruncateMultiXact(); TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true); Index: src/backend/access/transam/xlog.c === RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.317 diff -c -r1.317 xlog.c *** src/backend/access/transam/xlog.c 11 Aug 2008 11:05:10 - 1.317 --- src/backend/access/transam/xlog.c 22 Sep 2008 21:30:24 - *** *** 119,124 --- 119,125 /* Are we doing recovery from XLOG? */ bool InRecovery = false; + bool reachedSafeStopPoint = false; /* Are we recovering using offline XLOG archives? */ static bool InArchiveRecovery = false; *** *** 131,137 static bool recoveryTarget = false; static bool recoveryTargetExact = false; static bool recoveryTargetInclusive = true; - static bool recoveryLogRestartpoints = false; static TransactionId recoveryTargetXid; static TimestampTz recoveryTargetTime; static TimestampTz recoveryLastXTime = 0; --- 132,137 *** *** 286,295 --- 286,297 /* * Total shared-memory state for XLOG. */ + #define XLOGCTL_BUFFER_SPACING 128 typedef struct XLogCtlData { /* Protected by WALInsertLock: */ XLogCtlInsert Insert; + char InsertPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlInsert)]; /* Protected by info_lck: */ XLogwrtRqst LogwrtRqst; *** *** 297,305 --- 299,314 uint32 ckptXidEpoch; /* nextXID & epoch of latest checkpoint */ TransactionId ckptXid; XLogRecPtr asyncCommitLSN; /* LSN of newest async commit */ + /* add data structure padding for above info_lck declarations */ + char InfoPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogwrtRqst) +