This is a review of:
https://commitfest.postgresql.org/action/patch_view?id=468
Purpose:
========
Equal and not-equal _may_ be quickly determined if their lengths are different.
This _may_ be a huge speed up if we dont have to detoat.
The Patch:
==========
I was able to read and understand the patch, its a simple change and looked
correct to me (a non PG hacker).
It applies clean to git head, compiles and runs fine with debug enabled.
make check passes
Usability:
==========
I used _may_ above. The benchmark included with the patch, showing huge speedups, is
really contrived. It uses a where clause with a thousand character constant: (where c =
'long...long...long...long...ConstantText...etc'). In my opinion this is very uncommon
(the author does note this is a "best case"). If you have a field large enough
to be toasted you are not going to be using that to search on, you are going to have an
ID field that is indexed. (select c where id = 7)
This also only touches = and <>. > < and like wont be touched. So I think the
scope of this is limited.
THAT being said, the patch is simple, and if you do happen to hit the code, it
will speed things up. As a user of PG I'd like to have this included. Its a
corner case, but a big corner, and its a small, simple change, and it wont slow
anything else down.
Performance:
============
I created myself a more real world test, with a table with indexes and id's and
a large toasted field.
create table junk(id serial primary key, xgroup integer, c text);
create index junk_group on junk(xgroup);
I filled it full of junk:
do $$
declare i integer;
declare j integer;
begin
for i in 1..100 loop
for j in 1..500 loop
insert into junk(xgroup, c) values (j, 'c'||i);
insert into junk (xgroup, c) select j, repeat('abc',
2000)|| n from generate_series(1, 5) n;
end loop;
end loop;
end$$;
This will make about 600 records within the same xgroup. As well as a simple 'c15' type
of value in c we can search for. My thinking is you may not know the exact unique id,
but you do know what group its in, so that'll cut out 90% of the records, and then you'll
have to add " and c = 'c15'" to get the exact one you want.
I still saw a nice performance boost.
Old PG:
$ psql < bench3.sql
Timing is on.
DO
Time: 2010.412 ms
Patched:
$ psql < bench3.sql
Timing is on.
DO
Time: 184.602 ms
bench3.sql:
do $$
declare i integer;
begin
for i in 1..400 loop
perform count(*) from junk where xgroup = i and c like 'c' || i;
end loop;
end$$;
Summary:
========
Performance speed-up: Oh yeah! If you just happen to hit it, and if you do
hit it, you might want to re-think your layout a little bit.
Do I want it? Yes please.
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers