Hi Shveta, shveta malik <shveta.ma...@gmail.com>, 1 Şub 2023 Çar, 15:01 tarihinde şunu yazdı:
> On Wed, Feb 1, 2023 at 5:05 PM Melih Mutlu <m.melihmu...@gmail.com> wrote: > 2) I found a crash in the previous patch (v9), but have not tested it > on the latest yet. Crash happens when all the replication slots are > consumed and we are trying to create new. I tweaked the settings like > below so that it can be reproduced easily: > max_sync_workers_per_subscription=3 > max_replication_slots = 2 > and then ran the test case shared by you. I think there is some memory > corruption happening. (I did test in debug mode, have not tried in > release mode). I tried to put some traces to identify the root-cause. > I observed that worker_1 keeps on moving from 1 table to another table > correctly, but at some point, it gets corrupted i.e. origin-name > obtained for it is wrong and it tries to advance that and since that > origin does not exist, it asserts and then something else crashes. > From log: (new trace lines added by me are prefixed by shveta, also > tweaked code to have my comment 1 fixed to have clarity on worker-id). > > form below traces, it is clear that worker_1 was moving from one > relation to another, always getting correct origin 'pg_16688_1', but > at the end it got 'pg_16688_49' which does not exist. Second part of > trace shows who updated 'pg_16688_49', it was done by worker_49 which > even did not get chance to create this origin due to max_rep_slot > reached. > Thanks for investigating this error. I think it's the same error as the one Shi reported earlier. [1] I couldn't reproduce it yet but will apply your tweaks and try again. Looking into this. [1] https://www.postgresql.org/message-id/OSZPR01MB631013C833C98E826B3CFCB9FDC69%40OSZPR01MB6310.jpnprd01.prod.outlook.com Thanks, -- Melih Mutlu Microsoft