On Fri, Jan 15, 2016 at 3:29 PM, Peter Geoghegan <p...@heroku.com> wrote: > On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan <kva...@gmail.com> wrote: >> I have a draft implementation which divides the whole process between >> N parallel workers, see the patch attached. Instead of a full scan of >> the relation, I give each worker a range of blocks to read. > > I am currently working on a patch that allows B-Tree index builds to > be performed in parallel. I think I'm a week or two away from posting > it. > > Even without parallelism, wouldn't it be better if GIN indexes were > built using tuplesort? I know way way less about the gin am than the > nbtree am, but I imagine that a prominent cost for GIN index builds is > constructing the main B-Tree (the one that's constructed over key > values) itself. Couldn't tuplesort.c be adapted to cover this case? > That would be much faster in general, particularly with the recent > addition of abbreviated keys, while also leaving a clear path forward > to performing the build in parallel.
I think it would take a lot of changes to tuple sort to make this be a almost-always win. In the general case each GIN key occurs in many tuples, and the in-memory rbtree is good at compressing the tid list for each key to maximize the amount of data that can be in memory (although perhaps it could be even better, as it doesn't use varbyte encoding on the tid list the way the posting lists on disk do--it could do so in the bulk build case, where it receives the tid in order, but not feasibly in the pending-list clean-up case, where it doesn't get them in order) When I was testing building an index on a column of text identifiers where there were a couple million identifiers occurring a few dozen times each, building it as a gin index (using btree_gin so that plain text columns could be indexed) was quite a bit faster than building it as a regular btree index using tuple sort. I didn't really investigate this difference, but I assume it was due to the better memory usage leading to less IO. (Disclaimer: this was on those identifiers which had little difference in the first 8 bytes, so they didn't really benefit from the abbreviated keys.) So in addition to shimming (key, tid) pairs to look like something that can be fed to tuple sort, I think we would also need to make tuple sort do on-the fly aggregation when it finds adjacent equal sort columns, so it can make (key,tid) structures rather than (key,tid). Without that, I think using tuple sort would be a loss at least as often as it would be a win. But I do think this with aggregation would be worth it despite the amount of work involved. In addition to automatically benefiting from parallel sorts and any other future sort improvements, I think it would also generate better indexes. I have a theory that a problem with very large gin indexes is that repeatedly building maintenance_work_mem worth of rbtree entries and then merging them to disk yields highly fragmented btrees, in which logical adjacent key-space does not occupy physically nearby leaf pages, which then can causes problems either with access that follows a pattern (like range scans, except gin indexes can't really do those anyway) or further bulk operations. So I agree with that a better long term approach would probably be to make gin index builds (at least the bulk ones, I don't know about the pending-list-cleanup) to use a tuple sort. But it looks like Constantin has already done most of the work on his current proposal, and no one has volunteered to do the work on converting it to tuple sort instead. Cheers, Jeff -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers