Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Fri, Sep 12, 2008 at 8:29 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
 On Thu, Sep 11, 2008 at 08:51:53PM -0600, Alex Hunsaker wrote:
 On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
  Alex,
 
  I meant to check the performance with increasing numbers of collisions,
  not increasing size of the hashed item. In other words, something like
  this:
 
  for ($coll=500; $i=100; $i=$i*2) {
   for ($i=0; $i=100; $i++) {
 hash(int8 $i);
   }
   # add the appropriate number of collisions, distributed evenly to
   # minimize the packing overrun problem
   for ($dup=0; $dup=$coll; $dup++) {
 hash(int8 MAX_INT + $dup * 100/$coll);
   }
  }
 
  Ken

 *doh* right something like this...

 create or replace function create_test_hash() returns bool as $$
 declare
 coll integer default 500;
 -- tweak this to where create index gets really slow
 max_coll integer default 100;
 begin
 loop
 execute 'create table test_hash_'|| coll ||'(num int8);';
 execute 'insert into test_hash_'|| coll ||' (num) select n
 from generate_series(0, '|| max_coll ||') as n;';
 execute 'insert into test_hash_'|| coll ||' (num) select
 (n+4294967296) * '|| max_col ||'/'|| coll ||'::int from
 generate_series(0, '|| coll ||') as n;';

 coll := coll * 2;

 exit when coll = max_coll;
 end loop;
 return true;
 end;
 $$ language 'plpgsql';

 And then benchmark each table, and for extra credit cluster the table
 on the index and benchmark that.

 Also obviously with the hashint8 which just ignores the top 32 bits.

 Right?

 Yes, that is exactly right.

 Ken

Ok I finally found time to do this, In summary looks like v5 scales
about the same as cvs head when the collisions are spread evenly
(obviously not HEAD with the hash patch applied...).   I couldn't test
cluster because we can't cluster on hash indexes...

benchmark with 50,000,000 rows and 500 collisions:
index creation time:
head: 326116.647 ms
v5: 269509.026 ms

pgbench  -n -c1 -T600 -f q.sql hash
head: tps = 3226.605611
v5:  tps = 3412.64 (excluding connections establishing)

50,000,000 rows and 32,768,000 collisions
index time:
head:  576600.967 ms
v5: 487949.490 ms

pgbench -n -c1 -T500 -f q.sql hash
head: tps = 3105.270185
v5: tps = 3382.25782

You can see each result from 500 all the way up to 32,768,000
collision in the attached result.out

Attached files:
create_test_hash.sql: function I used to create the test tables
result.out: output from bench.pl which shows the pgbench results and
the create index times
bench.pl: stupid little perl script to test pgbench each of the
created tables from create_test_hash.pl


bench.pl
Description: Perl program


create_test_hash.sql
Description: Binary data


result.out
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Sun, Sep 14, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote:
 BTW, one thing I noticed was that the hash index build time for the
 wide column case got a lot worse after applying the patch (from 56 to
 237 sec).  The reason for this turned out to be that with the smaller
 predicted index size, the code decided not to use the pre-sorting method
 that was recently added.  Reducing effective_cache_size to less than the
 index size brought the time back down, to about 54 sec.  So it would
 seem that effective_cache_size is too large a cutoff value.  I'm
 considering changing hashbuild to switch over at shared_buffers instead
 of effective_cache_size --- any thoughts about that?

Switching to shared_buffers gets my vote, on my test table with
50,000,000 rows it takes about 8 minutes to create an index using the
default effective_cache_size.  With effective_cache_size set to 6GB
(machine has 8GB) its still going an hour later.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-22 Thread Alex Hunsaker
On Mon, Sep 22, 2008 at 7:57 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
 50,000,000 rows and 32,768,000 collisions

I should mention thats 50 million rows + 32 million more or 62,768,000
rows which explains some of the added index creation time...

 index time:
 head:  576600.967 ms
 v5: 487949.490 ms

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-12 Thread Alex Hunsaker
On Fri, Sep 12, 2008 at 3:14 AM, Zdenek Kotala [EMAIL PROTECTED] wrote:
 Alex Hunsaker napsal(a):

 On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:

 On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED]
 wrote:

 What locale did you use? It would be nice to have also comparing between
 C
 and any UTF8 locale. I think we should see big benefit when non C locale
 is
 used.

 Err yes this was UTF8, Ill see about doing a C locale.


 And here it with a C locale:

 pgbench -c1 -n -t10 -f bench.sql
 cvs head: tps = 5142.784262
 v5:   tps = 6169.405965


 If I look on both results

C   UTF8difference
 ---
 cvs head:   51405050-2%
 v5: 61705750-7%
 improvement:20% 14%

 than I little bit wonder. I personally expected bigger difference of UTF8
 comparing between CVS a v5. This test also shows that locale selection has
 bigger impact on performance in v5 case, but result is still better than cvs
 head.

Right, I think part of it is I need to try again with a larger
dataset... The reason I did three runs before was because it was
variable between (v5 between 5700 and 6200).  Also the query im doing
ming be two simplistic because it runs to fast, thats part of why I
get the variable speed between runs (i think...)

Suggestions?

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-11 Thread Alex Hunsaker
On Thu, Sep 11, 2008 at 9:24 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
 Alex,

 I meant to check the performance with increasing numbers of collisions,
 not increasing size of the hashed item. In other words, something like
 this:

 for ($coll=500; $i=100; $i=$i*2) {
  for ($i=0; $i=100; $i++) {
hash(int8 $i);
  }
  # add the appropriate number of collisions, distributed evenly to
  # minimize the packing overrun problem
  for ($dup=0; $dup=$coll; $dup++) {
hash(int8 MAX_INT + $dup * 100/$coll);
  }
 }

 Ken

*doh* right something like this...

create or replace function create_test_hash() returns bool as $$
declare
coll integer default 500;
-- tweak this to where create index gets really slow
max_coll integer default 100;
begin
loop
execute 'create table test_hash_'|| coll ||'(num int8);';
execute 'insert into test_hash_'|| coll ||' (num) select n
from generate_series(0, '|| max_coll ||') as n;';
execute 'insert into test_hash_'|| coll ||' (num) select
(n+4294967296) * '|| max_col ||'/'|| coll ||'::int from
generate_series(0, '|| coll ||') as n;';

coll := coll * 2;

exit when coll = max_coll;
end loop;
return true;
end;
$$ language 'plpgsql';

And then benchmark each table, and for extra credit cluster the table
on the index and benchmark that.

Also obviously with the hashint8 which just ignores the top 32 bits.

Right?

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote:
 What locale did you use? It would be nice to have also comparing between C
 and any UTF8 locale. I think we should see big benefit when non C locale is
 used.

Err yes this was UTF8, Ill see about doing a C locale.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 10:27 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:
 On Wed, Sep 10, 2008 at 8:47 AM, Zdenek Kotala [EMAIL PROTECTED] wrote:
 What locale did you use? It would be nice to have also comparing between C
 and any UTF8 locale. I think we should see big benefit when non C locale is
 used.

 Err yes this was UTF8, Ill see about doing a C locale.


And here it with a C locale:

pgbench -c1 -n -t10 -f bench.sql
cvs head: tps = 5142.784262
v5:   tps = 6169.405965


and just for fun a the same using a btree index
pgbench -c1 -n -t10 -f bench.sql
cvs head: tps = 5234.680407
v5:   tps = 5286.08252

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
 On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
 On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
  I think that the glacial speed for generating a big hash index is
  the same problem that the original code faced.

 Yeah sorry, I was not saying it was a new problem with the patch.  Err
 at least not trying to :) *Both* of them had been running at 18+ (I
 finally killed them sometime Sunday or around +32 hours...)

  It would be useful to have an equivalent test for the hash-only
  index without the modified int8 hash function, since that would
  be more representative of its performance. The collision rates
  that I was observing in my tests of the old and new mix() functions
  was about 2 * (1/1) of what you test generated. You could just
  test against the integers between 1 and 200.

 Sure but then its pretty much just a general test of patch vs no
 patch.  i.e. How do we measure how much longer collisions take when
 the new patch makes things faster?  That's what I was trying to
 measure... Though I apologize I don't think that was clearly stated
 anywhere...

 Right, I agree that we need to benchmark the collision processing
 time difference. I am not certain that two data points is useful
 information. There are 469 collisions with our current hash function
 on the integers from 1 to 200. What about testing the performance
 at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,...
 Unless you adjust the fill calculation for the CREATE INDEX, I would
 stop once the time to create the index spikes. It might also be useful
 to see if a CLUSTER affects the performance as well. What do you think
 of that strategy?

Not sure it will be a good benchmark of collision processing.  Then
again you seem to have studied the hash algo closer than me.  Ill go
see about doing this.  Stay tuned.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-10 Thread Alex Hunsaker
On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
 On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
 On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
 On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
  I think that the glacial speed for generating a big hash index is
  the same problem that the original code faced.

 Yeah sorry, I was not saying it was a new problem with the patch.  Err
 at least not trying to :) *Both* of them had been running at 18+ (I
 finally killed them sometime Sunday or around +32 hours...)

  It would be useful to have an equivalent test for the hash-only
  index without the modified int8 hash function, since that would
  be more representative of its performance. The collision rates
  that I was observing in my tests of the old and new mix() functions
  was about 2 * (1/1) of what you test generated. You could just
  test against the integers between 1 and 200.

 Sure but then its pretty much just a general test of patch vs no
 patch.  i.e. How do we measure how much longer collisions take when
 the new patch makes things faster?  That's what I was trying to
 measure... Though I apologize I don't think that was clearly stated
 anywhere...

 Right, I agree that we need to benchmark the collision processing
 time difference. I am not certain that two data points is useful
 information. There are 469 collisions with our current hash function
 on the integers from 1 to 200. What about testing the performance
 at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,...
 Unless you adjust the fill calculation for the CREATE INDEX, I would
 stop once the time to create the index spikes. It might also be useful
 to see if a CLUSTER affects the performance as well. What do you think
 of that strategy?

 Not sure it will be a good benchmark of collision processing.  Then
 again you seem to have studied the hash algo closer than me.  Ill go
 see about doing this.  Stay tuned.

Assuming I understood you correctly, And I probably didn't this does
not work very well because you max out at 27,006 values before you get
this error:
ERROR:  index row size 8152 exceeds hash maximum 8144
HINT:  Values larger than a buffer page cannot be indexed.

So is a power-of-2 multiple of 500 not simply:
x = 500;
while(1)
{
print x;
x *= 2;
}

?

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall [EMAIL PROTECTED] wrote:
 I think that the glacial speed for generating a big hash index is
 the same problem that the original code faced.

Yeah sorry, I was not saying it was a new problem with the patch.  Err
at least not trying to :) *Both* of them had been running at 18+ (I
finally killed them sometime Sunday or around +32 hours...)

 It would be useful to have an equivalent test for the hash-only
 index without the modified int8 hash function, since that would
 be more representative of its performance. The collision rates
 that I was observing in my tests of the old and new mix() functions
 was about 2 * (1/1) of what you test generated. You could just
 test against the integers between 1 and 200.

Sure but then its pretty much just a general test of patch vs no
patch.  i.e. How do we measure how much longer collisions take when
the new patch makes things faster?  That's what I was trying to
measure... Though I apologize I don't think that was clearly stated
anywhere...
Now of course it still would be interesting...  And if its only to
2,000,000 I can still use the modified int8 or just use the int4
one...

Anyway Here are the numbers:
create table test_hash(num int8);
insert into test_hash (num) select generate_series(1, 200);
create index test_hash_num_idx on test_hash (num);

pgbench -c1 -n -t1 -f bench_index.sql
cvs head: tps = 3161.500511
v5:   tps = 7248.808839

BTW Im still planning on doing a wide vs narrow test... sometime... :)

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:23 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
 create table test_hash(num int8);
 insert into test_hash (num) select generate_series(1, 200);
 create index test_hash_num_idx on test_hash (num);

 pgbench -c1 -n -t1 -f bench_index.sql
 cvs head: tps = 3161.500511
 v5:   tps = 7248.808839

Whoa... Ignore those numbers!

It would bee nice to actually test with a hash index... and my cvs
head atm had cassert on.. sorry about that.

create table test_hash(num int8);
insert into test_hash (num) select generate_series(1, 200);
create index test_hash_num_idx on test_hash using hash (num);

pgbench -c1 -n -t10 -f bench_index.sql
cvs head: tps = 7345.69432
v5:   tps = 7526.290462

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-09 Thread Alex Hunsaker
On Tue, Sep 9, 2008 at 7:23 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
 BTW Im still planning on doing a wide vs narrow test... sometime... :)

narrow: (exactly the same as what I just did in the other post)
create table test_hash(num int8);
insert into test_hash (num) select generate_series(1, 200);
create index test_hash_num_idx on test_hash using hash (num);

pgbench -c1 -n -t10 -f bench_index.sql
cvs head: tps = 7345.69432
v5:   tps = 7526.290462


wide:
# NOTE not on the same machine as the narrow test was run!

# spit out 2, 000, 000 random 100 length strings
perl gen.pl  data.sql
create table test_hash (wide text);
copy test_hash from './data.sql';
create index test_hash_num_idx on test_hash using hash (wide);

bench.sql:
select a.wide from test_hash as a inner join test_hash as b on b.wide
= a.wide where a.wide =
'BJNORSLMITGKHJCWDBLKLYRSJTVPTYXZJPWNBKXGHYFNDHRAKNFMDHRMUXLDXNTRBJMTHPGPBFJZPAENZXDHAHCUSCJTUPUXWCXUH';

# ^ that string is in data.sql

# 3 runs each
pgbench -c1 -n -t10 -f bench.sql
cvs head: tps = 5073.463498, 5110.620923, 4955.347610
v5:   tps = 5870.681336, 5740.007837, 5699.002942


gen.pl
Description: Perl program

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] to_date() validation

2008-09-06 Thread Alex Hunsaker
On Fri, Aug 29, 2008 at 7:39 PM, Brendan Jurd [EMAIL PROTECTED] wrote:
 Hi all,

 Well, it's been a long time coming, but I've finally got a patch to
 improve the error handling of to_date() and to_timestamp(), as
 previously discussed on -hackers. [1][2]

BTW -patches is obsolete just submit pathces to -hackers.

Im just following this:
http://wiki.postgresql.org/wiki/Reviewing_a_Patch so lets get started.

Submission review: Everything looks good. Tests seem reasonable patch
applies etc.

Usability review: I think both linked threads in the parent mail give
sufficient justification.

Feature test: Everything seems to work as advertised. However before
via sscanf() most limited the max length of the input.  Before they
were just silently ignored now they get added to the result:

patched:
# SELECT to_timestamp('1', 'HHMM');
 to_timestamp
--
 0009-03-19 11:00:00-06:59:56

8.3.3:
# SELECT to_timestamp('1', 'HHMM');
   to_timestamp
---
 0001-11-01 11:00:00-07 BC

Performance review: simple pgbench of to_timestamp did not show any
noticeable differences

Coding review: seemed fine to me, code matched surrounding style,
comments made sense etc

Code review: one minor nit
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
***
*** 781,787  static const KeyWord DCH_keywords[] = {
{y, 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN},

/* last */
!   {NULL, 0, 0, 0}
  };

  /* --
--- 781,787 
{y, 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN},

/* last */
!   {NULL, 0, 0, 0, 0}
  };

  /* --
***

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-06 Thread Alex Hunsaker
On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane [EMAIL PROTECTED] wrote:
For the convenience of anyone intending to test, here is an updated
patch against CVS HEAD that incorporates Alex's fix.

Here are the results for a table containing 1 million entries that
will generate hash collisions.  It paints a bad picture for the patch
but then again im not sure how relevant the issue is.  For example
yesterday I imported a table with 10 million collisions and the create
index is still running (now at about ~18 hours).  Maybe we should warn
if there are lots of collisions when creating the index and suggest
you use a btree? Anyway here are the results.

./pgbench -c1 -n -t10 -f bench_createindex.sql
cvs head: tps = 0.002169
v5  : tps = 0.002196

pgbench -c1 -n -t1000 -f bench_bitmap.sql
cvs head: tps = 24.011871
v5:   tps = 2.543123

pgbench -c1 -n -t1000 -f bench_index.sql
cvs head: tps = 51.614502
v5:   tps = 3.205542

pgbench -c1 -n -t1000 -f bench_seqscan.sql
cvs head: tps = 8.553318
v5:   tps = 9.836091

Table created via:
create table test_hash (num int8);
./hash | psql -c 'copy test_hash from stdin;'


bench_create.sql
Description: Binary data


bench_index.sql
Description: Binary data


bench_seqscan.sql
Description: Binary data


int8collide.patch
Description: Binary data
#include stdio.h
#include stdlib.h
#include limits.h

int main(void)
{
	unsigned long y = 0;
	unsigned cnt = 0;

	while(cnt  100)
	{
		y += UINT_MAX;
		y += 1;

		printf(%ld\n, y);

		cnt++;
	}
}

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-06 Thread Alex Hunsaker
2008/9/6 Alex Hunsaker [EMAIL PROTECTED]:
 pgbench -c1 -n -t1000 -f bench_bitmap.sql
 cvs head: tps = 24.011871
 v5:   tps = 2.543123

oops forgot to attach bench_bitmap.sql


bench_bitmap.sql
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-05 Thread Alex Hunsaker
Ok now that I made it so it actually *test* collisions, with the patch
it always returns all rows that matched the hashed key.  For example
(non lobotomized inthash8, I just brute forced some collisions for 0
so that anyone can try this)

create table test_hash (num int8);
insert into test_hash (num) values (0), (1805671691), (3294821164),
(4294967297);
create index test_hash_num_idx on test_hash using hash (num);
select * from test_hash where num = 0;
num

  0
 1805671691
 3294821164
 4294967297

set enable_bitmapscan to off;

select * from test_hash where num = 0;
 num
-
   0

CVS HEAD, 8_3_STABLE obviously work if I force the index scan

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-05 Thread Alex Hunsaker
]

On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote:
 Ok now that I made it so it actually *test* collisions, with the patch
 it always returns all rows that matched the hashed key.

And here is the fix, we just forget to set the recheck flag for bitmap scans.

*** a/src/backend/access/hash/hash.c
--- b/src/backend/access/hash/hash.c
***
*** 317,323  hashgetbitmap(PG_FUNCTION_ARGS)
/* Save tuple ID, and continue scanning */
if (add_tuple)
{
!   tbm_add_tuples(tbm, scan-xs_ctup.t_self, 1, false);
ntids++;
}

--- 317,323 
/* Save tuple ID, and continue scanning */
if (add_tuple)
{
!   tbm_add_tuples(tbm, scan-xs_ctup.t_self, 1, true);
ntids++;
}

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Alex Hunsaker [EMAIL PROTECTED] writes:
 Ok let me know if this is to naive of an approach or not hitting the
 right cases you want tested.

 You have the unique-versus-not dimension, but I'm also wondering about
 narrow vs wide index keys (say about 8 bytes vs 50-100 or so).  In the
 former case we're not saving any index space by storing only the hash
 code, so these could be expected to have different performance
 behaviors.

Arg yes... I just read the last part of your mail in this thread.  I
think it was the one on -hackers that talked about narrow vs wide...
so I figured I would just try to do what the thread where you posted
the patch talked about namley the below:

So my thinking right now is that we should just test this patch as-is.
If it doesn't show really horrid performance when there are lots of
hash key collisions, we should forget the store-both-things idea and
just go with this.

So I thought, lets try to generate lots of hash collisions... obviosly
though using the same key wont do that... Not sure what I was thinking

 As for specifics of the suggested scripts:

 * might be better to do select count(*) not select 1, so that client
 communication is minimized

Yar.

 * check that the queries actually use the indexes (not sure that the
 proposed switch settings ensure this, not to mention you didn't create
 the indexes)

Well I was assuming I could just test the speed of a hash join...

 * make sure the pgbench transaction counts are large enough to ensure
 significant runtime
 * the specific table sizes suggested are surely not large enough

Ok

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane [EMAIL PROTECTED] wrote:
 So what we need for testing is a few different key values that hash to
 the same code.  Not sure about an easy way to find such.

Hrm, well I have not really looked at the hash algorithm but I assume
we could just reduce the number of buckets?

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 8:17 PM, Tom Lane [EMAIL PROTECTED] wrote:
 I guess one thing we could do for testing purposes is lobotomize one of
 the datatype-specific hash functions.  For instance, make int8_hash
 return the input mod 2^32, ignoring the upper bytes.  Then it'd be easy
 to compute different int8s that hash to the same thing.


Heh Ok im slowly getting there... So we lobotomize hashint8 and then
time how long it takes to make an index on a table... something like:
create table test_hash(num int8);

(obviously on a 64 bit machine)
int main(void)
{
unsigned long y = 0;
unsigned cnt = 0;

printf(insert into test_hash (num) values );

//while(cnt != LONG_MAX/UINT_MAX)
while(cnt  1000)
{
y += UINT_MAX;

printf((%ld), , y);

cnt++;
}

printf((0);\n);

}

./a.out | psql

pgbench -c 1 -t1000 -n -f test.sql

test.sql:
create index test_hash_num_idx on test_hash using hash (num);
drop index test_hash_num_idx;

For both pre and post patch just to make sure post patch is not worse
than pre patch???

If im still way off and its not to much trouble want to give me a test
case to run =) ?

Or maybe because hash collisions should be fairly rare its not
something to really worry about?

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout

2008-06-24 Thread Alex Hunsaker
On Mon, Jun 23, 2008 at 4:51 PM, Bruce Momjian [EMAIL PROTECTED] wrote:
 I would like to get do this without adding a new --use-statement-timeout
 flag.  Is anyone going to want to honor statement_timeout during
 pg_dump/pg_restore?  I thought we were just going to disable it.

I believe so.  This was when not everyone was convinced.  Im fairly
certain Josh original patch is in the commit fest. So feel free to
drop this one.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL

2008-06-12 Thread Alex Hunsaker
On Wed, Jun 11, 2008 at 4:07 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Agreed --- I committed what I had, anyone want to volunteer for
 refactoring the execution of DropStmt?

Sure! see the attached patch...

 After looking again, I think that this is not technically very
 difficult, but coming up with something that looks tasteful to everyone
 might be tricky.  In particular I didn't see a nice way to do it without
 using struct ObjectAddress in a bunch of header files that don't
 currently include dependency.h.  A possible response to that is to move
 ObjectAddress into postgres.h, but that seems a bit ugly too.

Ok I'm obviously missing something important... Why not Just make the
various Remove* functions take a list?

I'm not proposing this patch for actual submission, more of a would this work?
If I'm not missing something glaring obvious Ill go ahead and make the
rest of the Remove things behave the same way


dropStmt_mutlidelete.patch
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL

2008-06-12 Thread Alex Hunsaker
On Thu, Jun 12, 2008 at 11:35 AM, Alvaro Herrera
[EMAIL PROTECTED] wrote:
 I don't think there's anything wrong with that in principle.  However,
 does your patch actually work?  The changes in expected/ is unexpected,
 I think.

Yeah I thought they looked a bit odd at first to. I thought it would
just get rid of the duplicate NOTICES's.  On closer look they don't
NOITCE anymore because all the tables are listed in the drop.  Here is
an example:

# with all them in in drop table
create table test (a int primary key);
create table test_a (a int references test);
create table test_b (a int references test);
drop table test, test_a, test_b cascade;
DROP TABLE

# now without test_b
create table test (a int primary key);
create table test_a (a int references test);
create table test_b (a int references test);
drop table test, test_a cascade;
NOTICE:  drop cascades to constraint test_b_a_fkey on table test_b
DROP TABLE

In fact you don't even need the cascade anymore if you specify all the
dependent tables.
So that certainly *seems* right to me.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL

2008-06-12 Thread Alex Hunsaker
On Thu, Jun 12, 2008 at 11:58 AM, Tom Lane [EMAIL PROTECTED] wrote:
 I don't really like the patch though; it seems kind of a brute force
 solution.  You've got ProcessUtility iterating through a list of objects
 and doing a little bit of work on each one, and then making a new list
 that RemoveRelation (which is now misnamed) again iterates through and
 does a little bit of work on each one, and then passes that off *again*.
 There's no principled division of labor at work there; in particular
 it's far from obvious where things get locked.  You've also managed
 to embed an assumption not previously present, that all the objects in
 the list are of exactly the same type.

Yep, I thought about doing the reverse.  Namely Just passing the
DropStmt to RemoveRelation(s).  But then all the permission check
functions are in utility.c.  Splitting those out seemed to be about
the same as splitting out all the ObjectAddress stuff...
Plus with the potential ugliness I thought maybe this (as it is in the
patch) way while still ugly, might be the less of two uglies :)  And
besides it was brain dead simple...

 I think it would be better if the DropStmt loop were responsible
 for constructing a list of ObjectAddresses that it handed off directly
 to performMultipleDeletions.  This is why I was imagining replacing
 RemoveRelation et al with something that passed back an ObjectAddress,
 and getting worried about introducing references to ObjectAddress into
 all those header files.  Another possibility would be to get rid of
 RemoveRelation et al altogether and embed that code right into the
 DropStmt processing (though at this point we'd need to split it out
 of ProcessUtility; it's already a bit large for where it is).  That
 didn't seem tremendously clean either, though it would at least have
 the virtue of improving consistency about where privilege checks etc
 get done.


It seems strange to have _not_ have the permission checks in
RemoveRelation IMHO.  Granted utility.c is the only thing that calls
it...  It seems equally strange to (re)move RemoveRelation and friends
into utility.c.  But really if some other user besides utility.c was
going to use them wouldn't they want the permission checks?  So my
vote is to just move them into utility.c maybe even make them static
(until someone else needs them obviosly).  Make them return an
ObjectAddress (so they wont be called RemoveType, but getTypeAddress
or something) and be done with it.  Thoughts?

Unless you thought of a cleaner way ? :)

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL

2008-06-12 Thread Alex Hunsaker
On Thu, Jun 12, 2008 at 5:35 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Alex Hunsaker [EMAIL PROTECTED] writes:
 Yep, I thought about doing the reverse.  Namely Just passing the
 DropStmt to RemoveRelation(s).  But then all the permission check
 functions are in utility.c.  Splitting those out seemed to be about
 the same as splitting out all the ObjectAddress stuff...

 Well, that might actually be a good approach: try to get ProcessUtility
 back down to being just a dispatch switch.  It's pretty much of a wart
 that we're doing any permissions checking in utility.c at all.  Possibly
 those functions should be moved to aclchk.c and then used from
 RemoveRelation(s) and friends, which would stay where they are but
 change API.

Ok Ill work up a patch.  Whats that saying about sticking with your
first instinct?

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Tentative patch for making DROP put dependency info in DETAIL

2008-06-12 Thread Alex Hunsaker
On Thu, Jun 12, 2008 at 5:35 PM, Tom Lane [EMAIL PROTECTED] wrote:
 So that's leading me to lean towards keeping RemoveRelation
 et al where they are and distributing the work currently done in
 ProcessUtility out to them.  This sounds duplicative, but about all that
 really is there to duplicate is a foreach loop, which you're going to
 need anyway if the routines are to handle multiple objects.

Ok Here it is:
-Moves CheckDropPermissions and friends from utility.c to aclchk.c
(pg_drop_permission_check)

-Makes all the Remove* functions take a DropStmt *, they each do their
own foreach() loop and permission checks

-removed RemoveView and RemoveIndex because they were exactly the same
as RemoveRelation

-added an s to the end of the Remove* functions to denote they may
remove more than one (i.e. RemoveRelations)

-consolidated RemoveType and RemoveDomain into a common function
(static void removeHelper())

-made performMultipleDeletions when we only have one item we are
deleting log the same way (with the object name)

 src/backend/catalog/aclchk.c  |  154 +++
 src/backend/catalog/dependency.c  |9 +-
 src/backend/catalog/pg_conversion.c   |   54 ---
 src/backend/commands/conversioncmds.c |   45 +++--
 src/backend/commands/indexcmds.c  |   27 ---
 src/backend/commands/schemacmds.c |   91 +
 src/backend/commands/tablecmds.c  |   66 ++-
 src/backend/commands/tsearchcmds.c|  290 +
 src/backend/commands/typecmds.c   |  189 ---
 src/backend/commands/view.c   |   23 ---
 src/backend/tcop/utility.c|  288 +
 src/include/catalog/pg_conversion_fn.h|2 +-
 src/include/commands/conversioncmds.h |3 +-
 src/include/commands/defrem.h |   14 +-
 src/include/commands/schemacmds.h |2 +-
 src/include/commands/tablecmds.h  |2 +-
 src/include/commands/typecmds.h   |4 +-
 src/include/commands/view.h   |2 +-
 src/include/utils/acl.h   |1 +
 src/test/regress/expected/foreign_key.out |   11 -
 src/test/regress/expected/truncate.out|6 -
 21 files changed, 645 insertions(+), 638 deletions(-)


refactor_dropstmt.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


[PATCHES] guc config_enum_entry add hidden field

2008-05-27 Thread Alex Hunsaker
Tom Lane [EMAIL PROTECTED] writes:
 I am wondering if it's a good idea to hide the redundant entries
 to reduce clutter in the pg_settings display.  (We could do this
 by adding a hidden boolean to struct config_enum_entry.)
 Thoughts?

The Attached patch does just that...


guc_config_enum_entry_hide.patch
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [EMAIL PROTECTED]: Re: [BUGS] Problem identifying constraints which should not be inherited]

2008-05-09 Thread Alex Hunsaker
On Fri, May 9, 2008 at 5:37 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Alex Hunsaker [EMAIL PROTECTED] writes:
 [ patch to change inherited-check-constraint behavior ]

 Applied after rather heavy editorializations.  You didn't do very well on
 getting it to work in multiple-inheritance scenarios, such as

create table p (f1 int check (f10));
create table c1 (f2 int) inherits (p);
create table c2 (f3 int) inherits (p);
create table cc () inherits (c1,c2);

 Here the same constraint is multiply inherited.  The base case as above
 worked okay, but adding the constraint to an existing inheritance tree
 via ALTER TABLE, not so much.

Ouch. Ok Ill (obviously) review what you committed so I can do a lot
better next time.
Thanks for muddling through it!

 I also didn't like the choice to add is_local/inhcount fields to
 ConstrCheck: that struct is fairly heavily used, but you were leaving the
 fields undefined/invalid in most code paths, which would surely lead to
 bugs down the road when somebody expected them to contain valid data.
 I considered extending the patch to always set them up, but rejected that
 idea because ConstrCheck is essentially a creature of the executor, which
 couldn't care less about constraint inheritance.  After some reflection
 I chose to put the fields in CookedConstraint instead, which is used only
 in the table creation / constraint addition code paths.  That required
 a bit of refactoring of the API of heap_create_with_catalog, but I think
 it ended up noticeably cleaner: constraints and defaults are fed to
 heap.c in only one format now.

That sounds *way* cleaner and hopefully got rid of some of those gross
hacks I had to do.
Interestingly enough thats similar to how I initially started doing
it.  But it felt way to intrusive, so i dropped it.
Course I then failed the non-intrusive with the ConstrCheck changes...

 I found one case that has not really worked as intended for a long time:
 ALTER TABLE ADD CHECK ... (that is, ADD CONSTRAINT without specifying
 a constraint name) failed to ensure that the same constraint name was used
 at child tables as at the parent, and thus the constraints ended up not
 being seen as related at all.  Fixing this was a bit ugly since it meant
 that ADD CONSTRAINT has to recurse to child tables during Phase 2 after
 all, and has to be able to add work queue entries for Phase 3 at that
 time, which is not something needed by any other ALTER TABLE operation.

Ouch...

 I'm not sure if we ought to try to back-patch that --- it'd be a
 behavioral change with non-obvious implications.  In the back branches,
 ADD CHECK followed by DROP CONSTRAINT will end up not deleting the
 child-table constraints, which is probably a bug but I wouldn't be
 surprised if applications were depending on the behavior.

Given the lack complaints it does not seem worth a back patch IMHO.

regards, tom lane


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [EMAIL PROTECTED]: Re: [BUGS] Problem identifying constraints which should not be inherited]

2008-05-07 Thread Alex Hunsaker
On Wed, May 7, 2008 at 12:20 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:
  Find attached a diff from v4-v5, and a full patch.

   src/backend/commands/tablecmds.c   |  242 
 +++-

  src/backend/utils/cache/syscache.c |   12 --

  src/include/catalog/indexing.h |2 -
   src/include/utils/syscache.h   |1 -
   4 files changed, 153 insertions(+), 104 deletions(-)

  Currently this loops through all the constraints for a relation (old
  behavior of MergeAttributesIntoExisting)... Do you think its worth
  adding a non-unique index to speed this up?  If so I can whip up a
  patch real quick if you think its worth it... else


*sigh* Here is a fiix for a possible bogus failed to find constraint
error when we are trying to drop a constraint that is not a check
constraint
(interesting no regression tests failed... caught it while reviewing
the patch I just posted)

*** a/src/backend/commands/tablecmds.c
--- /bsrc/backend/commands/tablecmds.c
*** ATExecDropConstraint(Relation rel, const
*** 5080,5094 

con = (Form_pg_constraint) GETSTRUCT(tuple);

-   if (con-contype != CONSTRAINT_CHECK)
-   continue;
-
if (strcmp(NameStr(con-conname),
   constrName) != 0)
continue;
else
found = true;

if (con-coninhcount = 0)
elog(ERROR, relation %u has
non-inherited constraint \%s\,
childrelid, constrName);
--- 5080,5095 

con = (Form_pg_constraint) GETSTRUCT(tuple);

if (strcmp(NameStr(con-conname),
   constrName) != 0)
continue;
else
found = true;

+   /* Right now only CHECK constraints
can be inherited */
+   if (con-contype != CONSTRAINT_CHECK)
+   continue;
+
if (con-coninhcount = 0)
elog(ERROR, relation %u has
non-inherited constraint \%s\,
childrelid, constrName);

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout

2008-04-16 Thread Alex Hunsaker
On Wed, Apr 16, 2008 at 4:54 PM, Alvaro Herrera
[EMAIL PROTECTED] wrote:
 Joshua D. Drake escribió:

  That is an interesting idea. Something like:
  
   pg_restore -E SET STATEMENT_TIMEOUT=0; SET MAINTENANCE_WORK_MEM=1G ?

  We already have it -- it's called PGOPTIONS.


Ok but is not the purpose of the patch to turn off statement_timeout
by *default* in pg_restore/pg_dump?

Here is an updated patch for I posted above (with the command line
option --use-statement-timeout) for pg_dump and pg_restore.

(sorry If I hijacked your patch Josh :) )


pg_dump_restore_statement_timeout.diff
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [EMAIL PROTECTED]: Re: [BUGS] Problem identifying constraints which should not be inherited]

2008-03-31 Thread Alex Hunsaker
On Mon, Mar 31, 2008 at 2:36 AM, NikhilS [EMAIL PROTECTED] wrote:
 Hi Alex,

 I was taking a look at this patch to add the pg_dump related changes. Just
 wanted to give you a heads up as this patch crashes if we run make
 installcheck. Seems there is an issue introduced in the CREATE TABLE
 REFERENCES code path due to your patch (this is without my pg_dump changes
 just to be sure).  Looks like some memory overwrite issue. The trace is as
 follows:

Ouch, sorry i did not reply sooner... been out with the flu.  Oddly
enough make check and make installcheck worked great on my 64 bit box.
 But on my laptop(32 bits) make check lights up like a christmas tree.
 Which is why I did not notice the problem. :(

Attached is a patch that fixes the problem... (it was debugging from
an earlier version)
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index f105d39..7d12156 100644
*** a/src/backend/parser/parse_utilcmd.c
--- /bsrc/backend/parser/parse_utilcmd.c
*** transformColumnDefinition(ParseState *ps
*** 409,417 
  	{
  		constraint = lfirst(clist);
  
- 		constraint-is_local = true;
- 		constraint-inhcount = 0;
- 
  		/*
  		 * If this column constraint is a FOREIGN KEY constraint, then we fill
  		 * in the current attribute's name and throw it into the list of FK
--- 409,414 

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [BUGS] BUG #3973: pg_dump using inherited tables do not always restore

2008-03-06 Thread Alex Hunsaker
snip

  It seems much more restrictive than necessary, plus it does nothing
  for the check-constraint case.  My recollection of the previous
  discussion about how to fix this was that we needed to add an inhcount
  column to pg_constraint, and add entries for not-null constraints (at
  least inherited ones) to pg_constraint so that they'd be able to have
  inhcount fields.  The latter would also allow us to attach names to
  not-null constraints, which I think is required by spec but we've never
  supported.

 regards, tom lane


Ok I found some time to look at what would be involved in that...
Seems doable.  Ill see what I can whip up in the next month.  (Im time
pressed, who isn't though) Ill just post whatever i come up with (if
and when) to psql-patches.

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-patches


Re: [PATCHES] [BUGS] BUG #3973: pg_dump using inherited tables do not always restore

2008-03-03 Thread Alex Hunsaker
On Wed, Feb 20, 2008 at 3:55 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Alex Hunsaker [EMAIL PROTECTED] writes:
   create table junk (val integer not null, val2 integer);
   create table junk_child () inherits (junk_1);
   alter table junk_child alter column val drop not null;
   insert into junk_child (val2) values (1);

   pg_dump -t junk -t junk_child

   pg_restore/psql will fail because junk_child.val now has a not null
   constraint

  Actually the bug is that ALTER TABLE allows you to do that.  It should
  not be possible to drop an inherited constraint, but right now there's
  not enough information in the system catalogs to detect the situation.
  Fixing this has been on the TODO list for awhile:

 o %Prevent child tables from altering or dropping constraints
   like CHECK that were inherited from the parent table

 regards, tom lane


Hrm how about something like the attached patch?

It only handles set not null/drop not null.  And I thought about
making it so set default behaved the same way, but i can see how that
can be useful in the real world.  Thoughts?

Arguably pg_dump should just do something similar to what it does for
set default (because that dumps correctly)... I only say that because
there specific regressions test for the behavior I outlined above.
Which is now broken with my patch.

Be gentle... its my first dive into postgresql guts...


inhertied_null.patch
Description: Binary data

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-patches