Re: Parallel CREATE INDEX for GIN indexes

2024-05-28 Thread Andy Fan
Hi Tomas, I have completed my first round of review, generally it looks good to me, more testing need to be done in the next days. Here are some tiny comments from my side, just FYI. 1. Comments about GinBuildState.bs_leader looks not good for me. /* * bs_leader is only

Re: Parallel CREATE INDEX for GIN indexes

2024-05-13 Thread Tomas Vondra
On 5/13/24 10:19, Andy Fan wrote: > > Tomas Vondra writes: > >> ... >> >> I don't understand the question. The blocks are distributed to workers >> by the parallel table scan, and it certainly does not do that block by >> block. But even it it did, that's not a problem for this code. > > OK, I

Re: Parallel CREATE INDEX for GIN indexes

2024-05-13 Thread Andy Fan
Tomas Vondra writes: >>> 7) v20240502-0007-Detect-wrap-around-in-parallel-callback.patch >>> >>> There's one more efficiency problem - the parallel scans are required to >>> be synchronized, i.e. the scan may start half-way through the table, and >>> then wrap around. Which however means the

Re: Parallel CREATE INDEX for GIN indexes

2024-05-10 Thread Tomas Vondra
On 5/10/24 07:53, Andy Fan wrote: > > Tomas Vondra writes: > >>> I guess both of you are talking about worker process, if here are >>> something in my mind: >>> >>> *btbuild* also let the WORKER dump the tuples into Sharedsort struct >>> and let the LEADER merge them directly. I think this aim

Re: Parallel CREATE INDEX for GIN indexes

2024-05-10 Thread Andy Fan
Tomas Vondra writes: >> I guess both of you are talking about worker process, if here are >> something in my mind: >> >> *btbuild* also let the WORKER dump the tuples into Sharedsort struct >> and let the LEADER merge them directly. I think this aim of this design >> is it is potential to

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 17:51, Matthias van de Meent wrote: > On Thu, 9 May 2024 at 15:13, Tomas Vondra > wrote: >> Let me explain the relevant part of the patch, and how I understand the >> improvement suggested by Matthias. The patch does the work in three phases: >> >> >> 1) Worker gets data from table,

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Matthias van de Meent
On Thu, 9 May 2024 at 15:13, Tomas Vondra wrote: > Let me explain the relevant part of the patch, and how I understand the > improvement suggested by Matthias. The patch does the work in three phases: > > > 1) Worker gets data from table, split that into index items and add > those into a

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/2/24 20:22, Tomas Vondra wrote: >> >>> For some of the opclasses it can regress (like the jsonb_path_ops). I >>> don't think that's a major issue. Or more precisely, I'm not surprised >>> by it. It'd be nice to be able to disable the parallel builds in these >>> cases somehow, but I haven't

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 11:44, Andy Fan wrote: > > Hello Tomas, > 2) v20240502-0002-Use-mergesort-in-the-leader-process.patch The approach implemented by 0001 works, but there's a little bit of issue - if there are many distinct keys (e.g. for trigrams that can happen very easily),

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Tomas Vondra
On 5/9/24 12:14, Andy Fan wrote: > > Tomas Vondra writes: > >> 3) v20240502-0003-Remove-the-explicit-pg_qsort-in-workers.patch >> >> In 0002 the workers still do an explicit qsort() on the TID list before >> writing the data into the shared tuplesort. But we can do better - the >> workers can

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Andy Fan
Tomas Vondra writes: > 3) v20240502-0003-Remove-the-explicit-pg_qsort-in-workers.patch > > In 0002 the workers still do an explicit qsort() on the TID list before > writing the data into the shared tuplesort. But we can do better - the > workers can do a merge sort too. To help with this, we

Re: Parallel CREATE INDEX for GIN indexes

2024-05-09 Thread Andy Fan
Hello Tomas, >>> 2) v20240502-0002-Use-mergesort-in-the-leader-process.patch >>> >>> The approach implemented by 0001 works, but there's a little bit of >>> issue - if there are many distinct keys (e.g. for trigrams that can >>> happen very easily), the workers will hit the memory limit with

Re: Parallel CREATE INDEX for GIN indexes

2024-05-02 Thread Tomas Vondra
On 5/2/24 19:12, Matthias van de Meent wrote: > On Thu, 2 May 2024 at 17:19, Tomas Vondra > wrote: >> >> Hi, >> >> In PG17 we shall have parallel CREATE INDEX for BRIN indexes, and back >> when working on that I was thinking how difficult would it be to do >> something similar to do that for

Re: Parallel CREATE INDEX for GIN indexes

2024-05-02 Thread Matthias van de Meent
On Thu, 2 May 2024 at 17:19, Tomas Vondra wrote: > > Hi, > > In PG17 we shall have parallel CREATE INDEX for BRIN indexes, and back > when working on that I was thinking how difficult would it be to do > something similar to do that for other index types, like GIN. I even had > that on my list of