Re: [HACKERS] WIP: Fast GiST index build

Heikki Linnakangas Tue, 30 Aug 2011 02:09:20 -0700

On 26.08.2011 17:18, Alexander Korotkov wrote:

On Thu, Aug 25, 2011 at 11:08 PM, Heikki Linnakangas<
[email protected]>  wrote:

Could you share the test scripts, patches and data sets etc. needed to
reproduce the tests you've been running? I'd like to try them out on a test
server.


1) I've updated links to the datasets on the wiki page.
2) Script for index quality testing fastbuild_test.php is in the attachment.
In order to run it you need PHP with pdo and pdo_pgsql modules. Also
plantuner moduler is required (it is used to force planer to use specific
index). After running that script following query returns relative score of
index quality:

select indexname, avg(count::real/(select count from test_result a2 where
a2.indexname = 'usnoa2_idx3' and a2.predicate = a1.predicate and
a2.tablename = a1.tablename)::real) from test_result a1 where a1.tablename =
'usnoa2' group by indexname;

where 'usnoa2' - table name, 'usnoa2_idx3' - name of index which quality was
assumed to be 1.
3) Patch which makes plantuner work with HEAD is also in attachment.
4) Patch with my split algorithm implementation is attached. Now it's form
is appropriate only for testing purposes.
5) For indexes creation I use simple script which is attached as
'indexes.sql'. Also, similar script with different index names I'm running
with my split patch.

Feel free to ask questions about all this stuff.

Thanks. Meanwhile, I hacked together my own set of test scripts, and let them run over the weekend. I'm still running tests with ordered data, but here are some preliminary results:


           testname           |   nrows   |    duration     | accesses
-----------------------------+-----------+-----------------+----------
 points unordered auto       | 250000000 | 08:08:39.174956 |  3757848
 points unordered buffered   | 250000000 | 09:29:16.47012  |  4049832
 points unordered unbuffered | 250000000 | 03:48:10.999861 |  4564986

As you can see, the results are very disappointing :-(. The buffered builds take a lot *longer* than unbuffered ones. I was expecting the buffering to be very helpful at least in these unordered tests. On the positive side, the buffering made index quality somewhat better (accesses column, smaller is better), but that's not what we're aiming at.

What's going on here? This data set was large enough to not fit in RAM, the table was about 8.5 GB in size (and I think the index is even larger than that), and the box has 4GB of RAM. Does the buffering only help with even larger indexes that exceed the cache size even more?



Test methodology
----------------

These tests consist of creating a gist index using the point datatype.

     Table "public.points"
  Column |  Type   | Modifiers
--------+---------+-----------
  x      | integer |
  y      | integer |

CREATE INDEX testindex ON points_ordered USING gist (point(x,y)) WITH (buffering = 'on');

The points in the table are uniformly distributed. In the 'unordered' tests, they are in random order. The ordered tests use the exact same data, but sorted by x, y coordinates.

The 'accesses' column measures the quality of the produced index. Smaller is better. It is calculated by performing a million queries on the table, selecting points within a small square at evenly spaced locations. Like:

(SELECT COUNT(*) FROM points WHERE point(x,y) <@ box(point(xx-20, yy-20), point(xx+20, yy+20)));

The number of index pages touched by those queries are count from pg_statio_user_indexes, and that number is reported in the 'accesses' column.

I've attached the whole script used. Pass the number of rows to use in the test as argument, and the script does the rest.


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

rungisttests.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

Reply via email to