On 7/21/24 21:31, Andrey M. Borodin wrote: > Hi Tomas! > >> On 7 Jun 2024, at 20:41, Tomas Vondra >> <tomas.von...@enterprisedb.com> wrote: >> >> After looking into parallel builds for BRIN and GIN indexes, I was >> wondering if there's a way to do parallel builds for GiST too. I >> knew next to nothing about how GiST works, but I gave it a shot and >> here's what I have - the attached patch allows parallel GiST builds >> for the "unsorted" case (i.e. when the opclass does not include >> sortsupport), and does not support buffered builds. > > I think this totally makes sense. I've took a look into tuples > partitioning (for sorted build) in your Github and I see that it's > complicated feature. So, probably, we can do it later. I'm trying to > review the patch as it is now. Currently I have some questions about > code.
OK. I'm not even sure partitioning is the right approach for sorted builds. Or how to do it, exactly. > > 1. Do I get it right that is_parallel argument for gistGetFakeLSN() > is only needed for assertion? And this assertion can be ensured just > by inspecting code. Is it really necessary? Yes, in the patch it's only used for an assert. But it's actually incorrect - just as I speculated in my initial message (in the section about gistGetFakeLSN), it sometimes fails to detect a page split. I noticed that while testing the patch adding GiST to amcheck, and I reported that in that thread [1]. But I apparently forgot to post an updated version of this patch :-( I'll post a new version tomorrow, but in short it needs to update the fake LSN even if (lastlsn != currlsn) if is_parallel=true. It's a bit annoying this means we generate a new fake LSN on every call, and I'm not sure that's actually necessary. But I've been unable to come up with a better condition when to generate a new LSN. [1] https://www.postgresql.org/message-id/79622955-6d1a-4439-b358-ec2b6a9e7cbf%40enterprisedb.com > 2. gistBuildParallelCallback() updates indtuplesSize, but it seems to be > not used anywhere. AFAIK it's only needed to buffered build. 3. I > think we need a test that reliably triggers parallel and serial > builds. > Yeah, it's possible the variable is unused. Agreed on the testing. > As far as I know, there's a well known trick to build better GiST > over PostGIS data: randomize input. I think parallel scan is just > what is needed, it will shuffle tuples enough... > I'm not sure I understand this comment. What do you mean by "better GiST" or what does that mean for this patch? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company