Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts
On Fri, Dec 2, 2016 at 10:04 PM, Amit Kapilawrote: >> Here we shouldn't be accessing meta page after releasing the lock as >> concurrent activity can change these values. This can be fixed by >> storing these values in local variables before releasing the lock and >> passing local variables in hashbucketcleanup(). I will send patch >> shortly. > > Please find attached patch to fix above code. Committed. I don't know either whether this will fix things for Andreas, but it's certainly a bug fix in its own right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts
On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreichwrote: > Amit Kapila writes: > >> How should I connect to this database? If I use the user fdw >> mentioned in pg_hba.conf (changed authentication method to trust in >> pg_hba.conf), it says the user doesn't exist. Can you create a user >> in the database which I can use? > > There is also a superuser "postgres" and an unprivileged user "smith" > you should be able to login with. You could also start postgres in > single-user mode to bypass the authentication altogether. > Thanks. I have checked and found that my above speculation seems to be right which means that old bucket contains tuples from previous split. At the location of Assert, I have printed the values of old bucket, new bucket and actual bucket to which tuple belongs and below is the result. regression=# update public.hash_i4_heap set seqno = public.hash_i4_heap.random; ERROR: wrong bucket, old bucket:37, new bucket:549, actual bucket:293 So what above means is that tuple should either belong to bucket 37 or 549, but it actually belongs to 293. Both 293 and 549 are the buckets that are split from splitted from bucket 37 (you can find that by using calculation as used in _hash_expandtable). I have again checked the code and couldn't find any other reason execpt from what I mentioned in my previous mail. So, let us wait for the results of your new test run. > Amit Kapila writes: > >> Please find attached patch to fix above code. Now, if this is the >> reason of the problem you are seeing, it won't fix your existing >> database as it already contains some tuples in the wrong bucket. Can >> you please re-run the test to see if you can reproduce the problem? > > Ok, I'll do testing with the patch applied. > > Btw, I also find entries like following in the logging database: > > ERROR: could not read block 2638 in file "base/16384/17256": read only 0 of > 8192 bytes > > …with relfilenode being an hash index. I usually ignore these as they > naturally start occuring after a recovery because of an unrelated crash. > But since 11003eb, they also occur when the cluster has not yet suffered > a crash. > Hmm, I am not sure if this is related to previous problem, but it could be. Is it possible to get the operation and or callstack for above failure? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts
Amit Kapila writes: > How should I connect to this database? If I use the user fdw > mentioned in pg_hba.conf (changed authentication method to trust in > pg_hba.conf), it says the user doesn't exist. Can you create a user > in the database which I can use? There is also a superuser "postgres" and an unprivileged user "smith" you should be able to login with. You could also start postgres in single-user mode to bypass the authentication altogether. Amit Kapila writes: > Please find attached patch to fix above code. Now, if this is the > reason of the problem you are seeing, it won't fix your existing > database as it already contains some tuples in the wrong bucket. Can > you please re-run the test to see if you can reproduce the problem? Ok, I'll do testing with the patch applied. Btw, I also find entries like following in the logging database: ERROR: could not read block 2638 in file "base/16384/17256": read only 0 of 8192 bytes …with relfilenode being an hash index. I usually ignore these as they naturally start occuring after a recovery because of an unrelated crash. But since 11003eb, they also occur when the cluster has not yet suffered a crash. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts
On Sat, Dec 3, 2016 at 6:58 AM, Amit Kapilawrote: > On Sat, Dec 3, 2016 at 2:06 AM, Andreas Seltenreich > wrote: >> Hi, >> >> the new hash index code on 11003eb failed an assertion yesterday: >> >> TRAP: FailedAssertion("!(bucket == obucket)", File: "hashpage.c", Line: >> 1037) > > _hash_expandtable(Relation rel, Buffer metabuf) > { > .. > if (H_NEEDS_SPLIT_CLEANUP(oopaque)) > { > /* Release the metapage lock. */ > _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK); > > hashbucketcleanup(rel, old_bucket, buf_oblkno, start_oblkno, NULL, > metap->hashm_maxbucket, metap->hashm_highmask, > metap->hashm_lowmask, NULL, > NULL, true, NULL, NULL); > .. > } > > Here we shouldn't be accessing meta page after releasing the lock as > concurrent activity can change these values. This can be fixed by > storing these values in local variables before releasing the lock and > passing local variables in hashbucketcleanup(). I will send patch > shortly. > Please find attached patch to fix above code. Now, if this is the reason of the problem you are seeing, it won't fix your existing database as it already contains some tuples in the wrong bucket. Can you please re-run the test to see if you can reproduce the problem? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_hash_bucketsplit_sqlsmith_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts
On Sat, Dec 3, 2016 at 2:06 AM, Andreas Seltenreichwrote: > Hi, > > the new hash index code on 11003eb failed an assertion yesterday: > > TRAP: FailedAssertion("!(bucket == obucket)", File: "hashpage.c", Line: > 1037) > This can happen if we start new split before completing the previous split of a bucket or if there is still any remaining tuples present in the bucket being from the previous split. I see a problem in below code: _hash_expandtable(Relation rel, Buffer metabuf) { .. if (H_NEEDS_SPLIT_CLEANUP(oopaque)) { /* Release the metapage lock. */ _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK); hashbucketcleanup(rel, old_bucket, buf_oblkno, start_oblkno, NULL, metap->hashm_maxbucket, metap->hashm_highmask, metap->hashm_lowmask, NULL, NULL, true, NULL, NULL); .. } Here we shouldn't be accessing meta page after releasing the lock as concurrent activity can change these values. This can be fixed by storing these values in local variables before releasing the lock and passing local variables in hashbucketcleanup(). I will send patch shortly. However, I wanted to verify that this is the reason why you are seeing the problem. I could not connect to the database provided by you. > Statement was > > update public.hash_i4_heap set seqno = public.hash_i4_heap.random; > > It can be reproduced with the data directory (Debian stretch amd64) I've > put here: > > http://ansel.ydns.eu/~andreas/_hash_splitbucket_guts.tar.xz (12 MB) > > Backtrace below. The cluster hasn't suffered any crashes before this > incident. > How should I connect to this database? If I use the user fdw mentioned in pg_hba.conf (changed authentication method to trust in pg_hba.conf), it says the user doesn't exist. Can you create a user in the database which I can use? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers