On Fri, Jan 15, 2016 at 3:29 PM, Peter Geoghegan <p...@heroku.com> wrote:
> On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan <kva...@gmail.com> wrote:
>> I have a draft implementation which divides the whole process between
>> N parallel workers, see the patch attached. Instead of a full scan of
>> the relation, I give each worker a range of blocks to read.
> I am currently working on a patch that allows B-Tree index builds to
> be performed in parallel. I think I'm a week or two away from posting
> it.
> Even without parallelism, wouldn't it be better if GIN indexes were
> built using tuplesort? I know way way less about the gin am than the
> nbtree am, but I imagine that a prominent cost for GIN index builds is
> constructing the main B-Tree (the one that's constructed over key
> values) itself. Couldn't tuplesort.c be adapted to cover this case?
> That would be much faster in general, particularly with the recent
> addition of abbreviated keys, while also leaving a clear path forward
> to performing the build in parallel.

I think it would take a lot of changes to tuple sort to make this be a
almost-always win.

In the general case each GIN key occurs in many tuples, and the
in-memory rbtree is good at compressing the tid list for each key to
maximize the amount of data that can be in memory (although perhaps it
could be even better, as it doesn't use varbyte encoding on the tid
list the way the posting lists on disk do--it could do so in the bulk
build case, where it receives the tid in order, but not feasibly in
the pending-list clean-up case, where it doesn't get them in order)

When I was testing building an index on a column of text identifiers
where there were a couple million identifiers occurring a few dozen
times each, building it as a gin index (using btree_gin so that plain
text columns could be indexed) was quite a bit faster than building it
as a regular btree index using tuple sort.  I didn't really
investigate this difference, but I assume it was due to the better
memory usage leading to less IO.

(Disclaimer: this was on those identifiers which had little difference
in the first 8 bytes, so they didn't really benefit from the
abbreviated keys.)

So in addition to shimming (key, tid) pairs to look like something
that can be fed to tuple sort, I think we would also need to make
tuple sort do on-the fly aggregation when it finds adjacent equal sort
columns, so it can make (key,tid[]) structures rather than (key,tid).
Without that, I think using tuple sort would be a loss at least as
often as it would be a win.

But I do think this with aggregation would be worth it despite the
amount of work involved.  In addition to automatically benefiting from
parallel sorts and any other future sort improvements, I think it
would also generate better indexes.  I have a theory that a problem
with very large gin indexes is that repeatedly building
maintenance_work_mem worth of rbtree entries and then merging them to
disk yields highly fragmented btrees, in which logical adjacent
key-space does not occupy physically nearby leaf pages, which then can
causes problems either with access that follows a pattern (like range
scans, except gin indexes can't really do those anyway) or further
bulk operations.

So I agree with that a better long term approach would probably be to
make gin index builds (at least the bulk ones, I don't know about the
pending-list-cleanup) to use a tuple sort.  But it looks like
Constantin has already done most of the work on his current proposal,
and no one has volunteered to do the work on converting it to tuple
sort instead.



Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to