On Wed, Aug 2, 2023 at 3:35 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Mon, Jul 31, 2023 at 10:47 AM Peter Smith <smithpb2...@gmail.com> wrote: > > > > Hi hackers, > > > > BACKGROUND: > > > > The logical replication has different worker "types" (e.g. apply > > leader, apply parallel, tablesync). > > > > They all use a common structure called LogicalRepWorker, but at times > > it is necessary to know what "type" of worker a given LogicalRepWorker > > represents. > > > > Once, there were just apply workers and tablesync workers - these were > > easily distinguished because only tablesync workers had a valid > > 'relid' field. Next, parallel-apply workers were introduced - these > > are distinguishable from the apply leaders by the value of > > 'leader_pid' field. > > > > > > PROBLEM: > > > > IMO, deducing the worker's type by examining multiple different field > > values seems a dubious way to do it. This maybe was reasonable enough > > when there were only 2 types, but as more get added it becomes > > increasingly complicated. > > > > SOLUTION: > > > > AFAIK none of the complications is necessary anyway; the worker type > > is already known at launch time, and it never changes during the life > > of the process. > > > > The attached Patch 0001 introduces a new enum 'type' field, which is > > assigned during the launch. > > > > ~ > > > > This change not only simplifies the code, but it also permits other > > code optimizations, because now we can switch on the worker enum type, > > instead of using cascading if/else. (see Patch 0002). > > > > Thoughts? > > I can see the problem you stated and it's true that the worker's type > never changes during its lifetime. But I'm not sure we really need to > add a new variable to LogicalRepWorker since we already have enough > information there to distinguish the worker's type. > > Currently, we deduce the worker's type by checking some fields that > never change such as relid and leader_piid. It's fine to me as long as > we encapcel the deduction of worker's type by macros (or inline > functions). Even with the proposed patch (0001 patch), we still need a > similar logic to determine the worker's type.
Thanks for the feedback. I agree that the calling code will not look very different with/without this patch 0001, because everything is hidden by the macros/functions. But those HEAD macros/functions are also hiding the inefficiencies of the type deduction -- e.g. IMO it is quite strange that we can only recognize the worker is an "apply worker" by eliminating the other 2 possibilities. As further background, the idea for this patch came from considering another thread to "re-use" workers [1]. For example, when thinking about re-using tablesync workers the HEAD am_tablesync_worker() function begins to fall apart -- ie. it does not let you sensibly disassociate a tablesync worker from its relid (which you might want to do if tablesync workers were "pooled") because in the HEAD code any tablesync worker without a relid is (by definition) NOT recognized as a tablesync worker anymore! Those above issues just disappear when a type field is added. > > If we want to change both process_syncing_tables() and > should_apply_changes_for_rel() to use the switch statement instead of > using cascading if/else, I think we can have a function, say > LogicalRepWorkerType(), that returns LogicalRepWorkerType of the given > worker. > The point of that "switch" in patch 0002 was to demonstrate that with a worker-type enum we can write more *efficient* code in some places than the current cascading if/else. I think making a "switch" just for the sake of it, by adding more functions that hide yet more work, is going in the opposite direction. ------ [1] reuse workers - https://www.postgresql.org/message-id/flat/CAGPVpCTq%3DrUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia