Re: [PATCHES] hash index improving v3

2008-09-22 Thread Simon Riggs

On Tue, 2008-09-23 at 00:48 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I wasn't very happy with effective_cache_size and not happy with
> > shared_buffers either. If building hash indexes is memory critical then
> > we just need to say so and encourage others to set memory use correctly.
> > People are already aware that maintenance_work_mem needs to be increased
> > for large index builds and we will confuse people if we ignore that and
> > use another parameter instead.
> 
> I think you've got this completely backwards.  The general theory about
> maintenance_work_mem is "set it as large as you can stand it".  The
> issue at hand here is that the crossover point for hash index sort
> building seems to be a good deal less than all-the-memory-you-have.

There's an optimal point for btree builds using sorts that is a good
deal less also, so I don't get that.

Plus, if you give it all-the-memory-you-have there won't be anything
left for anything else. You might set it that high for an empty database
load but you don't set it that high in production unless you want to see
swapping when you create large indexes.

maintenance_work_mem is already used for 3 separate operations that bear
little resemblance to each other. If it's appropriate for all of those
then its appropriate for this usage also. Doing it otherwise is going to
mean more than 50% of people do the wrong thing, which would be a shame
because what's been done looks very cool.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> I wasn't very happy with effective_cache_size and not happy with
> shared_buffers either. If building hash indexes is memory critical then
> we just need to say so and encourage others to set memory use correctly.
> People are already aware that maintenance_work_mem needs to be increased
> for large index builds and we will confuse people if we ignore that and
> use another parameter instead.

I think you've got this completely backwards.  The general theory about
maintenance_work_mem is "set it as large as you can stand it".  The
issue at hand here is that the crossover point for hash index sort
building seems to be a good deal less than all-the-memory-you-have.

Perhaps there is a case for giving this behavior its very own
configuration parameter; but seeing that we still don't have all that
much of a use case for hash indexes at all, I don't feel a need to do
that yet.  In any case, tying it to maintenance_work_mem is certainly
wrong.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Simon Riggs

On Tue, 2008-09-23 at 00:05 -0400, Tom Lane wrote:
> "Jonah H. Harris" <[EMAIL PROTECTED]> writes:
> > On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
> >> On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> >>> I'm considering changing hashbuild to switch over at shared_buffers 
> >>> instead
> >>> of effective_cache_size --- any thoughts about that?
> >> 
> >> Switching to shared_buffers gets my vote, on my test table with
> >> 50,000,000 rows it takes about 8 minutes to create an index using the
> >> default effective_cache_size.  With effective_cache_size set to 6GB
> >> (machine has 8GB) its still going an hour later.
> 
> > Agreed.  I think using shared_buffers as a cutoff is a much better idea as 
> > well.
> 
> Already done in CVS a week or so back, but thanks for following up with
> some confirmation.

I think nobody noticed that change, or the discussion.

I wasn't very happy with effective_cache_size and not happy with
shared_buffers either. If building hash indexes is memory critical then
we just need to say so and encourage others to set memory use correctly.
People are already aware that maintenance_work_mem needs to be increased
for large index builds and we will confuse people if we ignore that and
use another parameter instead. At the very least, a controllable
parameter is the only appropriate choice for production systems, not one
that cannot be changed without restart. It would also throw away any
chance of having pg_restore of hash indexes run in parallel, since it
will not be able to control memory usage. Setting maintenance_work_mem
higher than shared buffers is easily possible now and we should just use
that rather than try to prejudge how people will and can configure their
systems. If maintenance_work_mem default is too low, lets increase it.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Tom Lane
"Jonah H. Harris" <[EMAIL PROTECTED]> writes:
> On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
>> On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>>> I'm considering changing hashbuild to switch over at shared_buffers instead
>>> of effective_cache_size --- any thoughts about that?
>> 
>> Switching to shared_buffers gets my vote, on my test table with
>> 50,000,000 rows it takes about 8 minutes to create an index using the
>> default effective_cache_size.  With effective_cache_size set to 6GB
>> (machine has 8GB) its still going an hour later.

> Agreed.  I think using shared_buffers as a cutoff is a much better idea as 
> well.

Already done in CVS a week or so back, but thanks for following up with
some confirmation.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Mon, Sep 22, 2008 at 7:57 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
> 50,000,000 rows and 32,768,000 collisions

I should mention thats 50 million rows + 32 million more or 62,768,000
rows which explains some of the added index creation time...

> index time:
> head:  576600.967 ms
> v5: 487949.490 ms

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Jonah H. Harris
On Mon, Sep 22, 2008 at 11:25 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
> On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> BTW, one thing I noticed was that the hash index build time for the
>> "wide column" case got a lot worse after applying the patch (from 56 to
>> 237 sec).  The reason for this turned out to be that with the smaller
>> predicted index size, the code decided not to use the pre-sorting method
>> that was recently added.  Reducing effective_cache_size to less than the
>> index size brought the time back down, to about 54 sec.  So it would
>> seem that effective_cache_size is too large a cutoff value.  I'm
>> considering changing hashbuild to switch over at shared_buffers instead
>> of effective_cache_size --- any thoughts about that?
>
> Switching to shared_buffers gets my vote, on my test table with
> 50,000,000 rows it takes about 8 minutes to create an index using the
> default effective_cache_size.  With effective_cache_size set to 6GB
> (machine has 8GB) its still going an hour later.


Agreed.  I think using shared_buffers as a cutoff is a much better idea as well.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> BTW, one thing I noticed was that the hash index build time for the
> "wide column" case got a lot worse after applying the patch (from 56 to
> 237 sec).  The reason for this turned out to be that with the smaller
> predicted index size, the code decided not to use the pre-sorting method
> that was recently added.  Reducing effective_cache_size to less than the
> index size brought the time back down, to about 54 sec.  So it would
> seem that effective_cache_size is too large a cutoff value.  I'm
> considering changing hashbuild to switch over at shared_buffers instead
> of effective_cache_size --- any thoughts about that?

Switching to shared_buffers gets my vote, on my test table with
50,000,000 rows it takes about 8 minutes to create an index using the
default effective_cache_size.  With effective_cache_size set to 6GB
(machine has 8GB) its still going an hour later.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Fri, Sep 12, 2008 at 8:29 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote:
> On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote:
>> On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote:
>> > Alex,
>> >
>> > I meant to check the performance with increasing numbers of collisions,
>> > not increasing size of the hashed item. In other words, something like
>> > this:
>> >
>> > for ($coll=500; $i<=100; $i=$i*2) {
>> >  for ($i=0; $i<=100; $i++) {
>> >hash(int8 $i);
>> >  }
>> >  # add the appropriate number of collisions, distributed evenly to
>> >  # minimize the packing overrun problem
>> >  for ($dup=0; $dup<=$coll; $dup++) {
>> >hash(int8 MAX_INT + $dup * 100/$coll);
>> >  }
>> > }
>> >
>> > Ken
>>
>> *doh* right something like this...
>>
>> create or replace function create_test_hash() returns bool as $$
>> declare
>> coll integer default 500;
>> -- tweak this to where create index gets really slow
>> max_coll integer default 100;
>> begin
>> loop
>> execute 'create table test_hash_'|| coll ||'(num int8);';
>> execute 'insert into test_hash_'|| coll ||' (num) select n
>> from generate_series(0, '|| max_coll ||') as n;';
>> execute 'insert into test_hash_'|| coll ||' (num) select
>> (n+4294967296) * '|| max_col ||'/'|| coll ||'::int from
>> generate_series(0, '|| coll ||') as n;';
>>
>> coll := coll * 2;
>>
>> exit when coll >= max_coll;
>> end loop;
>> return true;
>> end;
>> $$ language 'plpgsql';
>>
>> And then benchmark each table, and for extra credit cluster the table
>> on the index and benchmark that.
>>
>> Also obviously with the hashint8 which just ignores the top 32 bits.
>>
>> Right?
>>
> Yes, that is exactly right.
>
> Ken

Ok I finally found time to do this, In summary looks like v5 scales
about the same as cvs head when the collisions are spread evenly
(obviously not HEAD with the hash patch applied...).   I couldn't test
cluster because we can't cluster on hash indexes...

benchmark with 50,000,000 rows and 500 collisions:
index creation time:
head: 326116.647 ms
v5: 269509.026 ms

pgbench  -n -c1 -T600 -f q.sql hash
head: tps = 3226.605611
v5:  tps = 3412.64 (excluding connections establishing)

50,000,000 rows and 32,768,000 collisions
index time:
head:  576600.967 ms
v5: 487949.490 ms

pgbench -n -c1 -T500 -f q.sql hash
head: tps = 3105.270185
v5: tps = 3382.25782

You can see each result from 500 all the way up to 32,768,000
collision in the attached result.out

Attached files:
create_test_hash.sql: function I used to create the test tables
result.out: output from bench.pl which shows the pgbench results and
the create index times
bench.pl: stupid little perl script to test pgbench each of the
created tables from create_test_hash.pl


bench.pl
Description: Perl program


create_test_hash.sql
Description: Binary data


result.out
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-22 Thread Simon Riggs

On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> >> Do we really need a checkpoint there at all?
> 
> > "Timelines only change at shutdown checkpoints".
> 
> Hmm.  I *think* that that is just a debugging crosscheck rather than a
> critical property.  But yeah, it would take some close investigation,
> which maybe isn't warranted if you have a less-invasive solution.

OK, new patch, version 6. Some major differences to previous patch.

* new IsRecoveryProcessingMode() in shmem
* padding in XLogCtl to ensure above call is cheap
* specific part of bgwriter shmem for passing restartpoint data
* avoid Shutdown checkpoint at end of recovery, with carefully
considered positioning of statements (beware!)
* only one new postmaster mode, PM_RECOVERY
* bgwriter changes state without stopping/starting

Modes I have tested so far
* make check
* Start, Stop
* Crash Recovery
* Archive Recovery
* Archive Recovery, switch in middle of restartpoint

Modes not yet tested
* EXEC_BACKEND

Ready for serious review prior to commit. I will be performing further
testing also.

 backend/access/transam/multixact.c |2 
 backend/access/transam/xlog.c  |  328 ---
 backend/postmaster/bgwriter.c  |  371 +---!
 backend/postmaster/postmaster.c|   62 !!
 backend/storage/buffer/README  |5 
 backend/storage/buffer/bufmgr.c|   34 +!!
 include/access/xlog.h  |   14 !
 include/access/xlog_internal.h |3 
 include/catalog/pg_control.h   |2 
 include/postmaster/bgwriter.h  |2 
 include/storage/bufmgr.h   |2 
 include/storage/pmsignal.h |1 
 12 files changed, 279 insertions(+), 56 deletions(-), 491 mods(!)

There's a few subtle points along the way. I've tried to explain them
all in code comments, but questions welcome. At v6, most things are now
done a particular way for a specific reason.

Look especially at InRecovery, which is used extensively in different
parts of the code. The meaning of this has been subdivided into two
meanings, so only *some* of the places that use it have been changed.
All have been checked.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support
Index: src/backend/access/transam/multixact.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.28
diff -c -r1.28 multixact.c
*** src/backend/access/transam/multixact.c	1 Aug 2008 13:16:08 -	1.28
--- src/backend/access/transam/multixact.c	22 Sep 2008 19:28:56 -
***
*** 1543,1549 
  	 * SimpleLruTruncate would get confused.  It seems best not to risk
  	 * removing any data during recovery anyway, so don't truncate.
  	 */
! 	if (!InRecovery)
  		TruncateMultiXact();
  
  	TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
--- 1543,1549 
  	 * SimpleLruTruncate would get confused.  It seems best not to risk
  	 * removing any data during recovery anyway, so don't truncate.
  	 */
! 	if (!IsRecoveryProcessingMode())
  		TruncateMultiXact();
  
  	TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
Index: src/backend/access/transam/xlog.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.317
diff -c -r1.317 xlog.c
*** src/backend/access/transam/xlog.c	11 Aug 2008 11:05:10 -	1.317
--- src/backend/access/transam/xlog.c	22 Sep 2008 21:30:24 -
***
*** 119,124 
--- 119,125 
  
  /* Are we doing recovery from XLOG? */
  bool		InRecovery = false;
+ bool		reachedSafeStopPoint = false;
  
  /* Are we recovering using offline XLOG archives? */
  static bool InArchiveRecovery = false;
***
*** 131,137 
  static bool recoveryTarget = false;
  static bool recoveryTargetExact = false;
  static bool recoveryTargetInclusive = true;
- static bool recoveryLogRestartpoints = false;
  static TransactionId recoveryTargetXid;
  static TimestampTz recoveryTargetTime;
  static TimestampTz recoveryLastXTime = 0;
--- 132,137 
***
*** 286,295 
--- 286,297 
  /*
   * Total shared-memory state for XLOG.
   */
+ #define	XLOGCTL_BUFFER_SPACING	128
  typedef struct XLogCtlData
  {
  	/* Protected by WALInsertLock: */
  	XLogCtlInsert Insert;
+ 	char	InsertPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlInsert)];
  
  	/* Protected by info_lck: */
  	XLogwrtRqst LogwrtRqst;
***
*** 297,305 
--- 299,314 
  	uint32		ckptXidEpoch;	/* nextXID & epoch of latest checkpoint */
  	TransactionId ckptXid;
  	XLogRecPtr	asyncCommitLSN; /* LSN of newest async commit */
+ 	/* add data structure padding for above info_lck declarations */
+ 	char	InfoPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogwrtRqst) 
+