Hello, Andrey and I discussed this on IM, and after some back and forth, he came up with a brilliant idea: modify the WAL record for multixact creation, so that the offset of the next multixact is transmitted and can be replayed. (We know it when we create each multixact, because the number of members is known). So the replica can store the offset of the next multixact right away, even though it doesn't know the members for that multixact. On replay of the next multixact we can cross-check that the offset matches what we had written previously. This allows reading the first multixact, without having to wait for the replay of creation of the second multixact.
One concern is: if we write the offset for the second mxact, but haven't written its members, what happens if another process looks up the members for that multixact? We'll have to make it wait (retry) somehow. Given what was described upthread, it's possible for the multixact beyond that one to be written already, so we won't have the zero offset that would make us wait. Anyway, he's going to try and implement this. Andrey, please let me know if I misunderstood the idea. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/