On Fri, Oct 7, 2022 at 8:47 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Thu, Oct 6, 2022 at 9:04 PM houzj.f...@fujitsu.com > <houzj.f...@fujitsu.com> wrote: > > > > I think the root reason for this kind of deadlock problems is the table > > structure difference between publisher and subscriber(similar to the unique > > difference reported earlier[1]). So, I think we'd better disallow this > > case. For > > example to avoid the reported problem, we could only support parallel apply > > if > > pubviaroot is false on publisher and replicated tables' types(relkind) are > > the > > same between publisher and subscriber. > > > > Although it might restrict some use cases, but I think it only restrict the > > cases when the partitioned table's structure is different between publisher > > and > > subscriber. User can still use parallel apply for cases when the table > > structure is the same between publisher and subscriber which seems > > acceptable > > to me. And we can also document that the feature is expected to be used for > > the > > case when tables' structure are the same. Thoughts ? > > I'm concerned that it could be a big restriction for users. Having > different partitioned table's structures on the publisher and the > subscriber is quite common use cases. > > From the feature perspective, the root cause seems to be the fact that > the apply worker does both receiving and applying changes. Since it > cannot receive the subsequent messages while waiting for a lock on a > table, the parallel apply worker also cannot move forward. If we have > a dedicated receiver process, it can off-load the messages to the > worker while another process waiting for a lock. So I think that > separating receiver and apply worker could be a building block for > parallel-apply. >
I think the disadvantage that comes to mind is the overhead of passing messages between receiver and applier processes even for non-parallel cases. Now, I don't think it is advisable to have separate handling for non-parallel cases. The other thing is that we need to someway deal with feedback messages which helps to move synchronous replicas and update subscriber's progress which in turn helps to keep the restart point updated. These messages also act as heartbeat messages between walsender and walapply process. To deal with this, one idea is that we can have two connections to walsender process, one with walreceiver and the other with walapply process which according to me could lead to a big increase in resource consumption and it will bring another set of complexities in the system. Now, in this, I think we have two possibilities, (a) The first one is that we pass all messages to the leader apply worker and then it decides whether to execute serially or pass it to the parallel apply worker. However, that can again deadlock in the truncate scenario we discussed because the main apply worker won't be able to receive new messages once it is blocked at the truncate command. (b) The second one is walreceiver process itself takes care of passing streaming transactions to parallel apply workers but if we do that then walreceiver needs to wait at the transaction end to maintain commit order which means it can also lead to deadlock in case the truncate happens in a streaming xact. The other alternative is that we allow walreceiver process to wait for apply process to finish transaction and send the feedback but that seems to be again an overhead if we have to do it even for small transactions, especially it can delay sync replication cases. Even, if we don't consider overhead, it can still lead to a deadlock because walreceiver won't be able to move in the scenario we are discussing. About your point that having different partition structures for publisher and subscriber, I don't know how common it will be once we have DDL replication. Also, the default value of publish_via_partition_root is false which doesn't seem to indicate that this is a quite common case. We have fixed quite a few issues in this area in the last release or two which were found during development, so not sure if these are used quite often in the field but it could just be a coincidence. Also, it will only matter if there are large transactions that perform on such tables which I don't think will be easy to predict whether those are common or not. -- With Regards, Amit Kapila.