On Mon, Apr 23, 2012 at 08:34:44PM +0300, Ants Aasma wrote: > On Mon, Apr 23, 2012 at 4:37 PM, Sandro Santilli <s...@keybit.net> wrote: > > I'd love to see enhanced CTID operators, to fetch all visible tuples in a > > page > > using a tidscan. Something like: WHERE ctid =~ '(501,*)' or a ctidrange. > > Among other things, this would enable user-space implementation of > tablesample. Given the operator =~(tid, int) that matches the page > number and planner/executor integration so that it results in a TID > scan, you would need the following functions: > > random_pages(tbl regclass, samples int) returns int[] > aggregate function: > reservoir_sample(item anyelement, samples int) returns anyarray > > Implementations for both of the functions could be adapted from analyze.c. > > Then tablesample could be implemented with the following query: > SELECT (SELECT reservoir_sample(some_table, 50) AS samples > FROM some_table WHERE ctid =~ ANY (rnd_pgtids)) > FROM random_pages('some_table', 50) AS rnd_pgtids; > > Actually, now that I think about it, it could actually be implemented > without any modifications to core at some cost to efficiency. > random_pages would have to return tid[] that contains for each > generated pagenumber all possible tids on that page.
This is exactly what I'm after. I've actually started crafting such a TableSample function and I'm in the process to refine the signature so your suggested interface above is very useful, thanks ! But I don't understand the reservoir_sample call, what is it supposed to do ? And how flexibly "anyarray" return would be ? Could you return arbitrary typed rowtypes from it ? > By making the building blocks available users get more flexibility. > The downside would be that we can't automatically make better sampling > methods available. One approach doesn't preclude the other. TABLESAMPLE will still be useful, also for SQL compliance. --strk; ,------o-. | __/ | Delivering high quality PostGIS 2.0 ! | / 2.0 | http://strk.keybit.net - http://vizzuality.com `-o------' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers