Hi,

On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
> Applying this makes the _cancel test pass, moving the failure instead to the
> following _enable test (which matches what coypu and mylodon are seeing).

FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
get the buildfarm green again.

I'm fairly certain that the bug here is a simple race condition in the
test (not the main code!):

The flag informing whether the worker has started is cleared via an
on_shmem_exit() hook:

static void
launcher_exit(int code, Datum arg)
{
        ChecksumHelperShmem->abort = false;
        pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
}

but the the wait in the test is done via functions like:

        CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
        DECLARE
                enabled boolean;
        BEGIN
                LOOP
                        SELECT setting = 'on' INTO enabled FROM 
pg_catalog.pg_settings WHERE name = 'data_checksums';
                        IF enabled THEN
                                EXIT;
                        END IF;
                        PERFORM pg_sleep(1);
                END LOOP;
                RETURN enabled;
        END;
        $$ LANGUAGE plpgsql;

        INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting 
values');

        CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
        DECLARE
                enabled boolean;
        BEGIN
                PERFORM pg_sleep(1);
                SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings 
WHERE name = 'data_checksums';
                RETURN enabled;
        END;
        $$ LANGUAGE plpgsql;

which just waits for setting checksums to have finished.  It's
exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
make sure that a process has finished exiting.  Then followup tests fail
because the process is still running

Also:
        CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
        DECLARE
                counter integer;
        BEGIN
                FOR counter IN 1..30 LOOP
                        PERFORM count(a) FROM t1;
                        PERFORM pg_sleep(0.2);
                END LOOP;
                RETURN True;
        END;
        $$ LANGUAGE plpgsql;
}

really?  Let's just force the test take at least 6s purely from
sleeping?

Greetings,

Andres Freund

Reply via email to