Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts

2016-12-05 Thread Robert Haas
On Fri, Dec 2, 2016 at 10:04 PM, Amit Kapila  wrote:
>> Here we shouldn't be accessing meta page after releasing the lock as
>> concurrent activity can change these values.  This can be fixed by
>> storing these values in local variables before releasing the lock and
>> passing local variables in hashbucketcleanup().  I will send patch
>> shortly.
>
> Please find attached patch to fix above code.

Committed.  I don't know either whether this will fix things for
Andreas, but it's certainly a bug fix in its own right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts

2016-12-03 Thread Amit Kapila
On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich  wrote:
> Amit Kapila writes:
>
>> How should I connect to this database?  If I use the user fdw
>> mentioned in pg_hba.conf (changed authentication method to trust in
>> pg_hba.conf), it says the user doesn't exist.  Can you create a user
>> in the database which I can use?
>
> There is also a superuser "postgres" and an unprivileged user "smith"
> you should be able to login with.  You could also start postgres in
> single-user mode to bypass the authentication altogether.
>

Thanks.  I have checked and found that my above speculation seems to
be right which means that old bucket contains tuples from previous
split.  At the location of Assert, I have printed the values of old
bucket, new bucket and actual bucket to which tuple belongs and below
is the result.

regression=# update public.hash_i4_heap set seqno = public.hash_i4_heap.random;
ERROR:  wrong bucket, old bucket:37, new bucket:549, actual bucket:293

So what above means is that tuple should either belong to bucket 37 or
549, but it actually belongs to 293.  Both 293 and 549 are the buckets
that are split from splitted from bucket 37 (you can find that by
using calculation as used in _hash_expandtable).  I have again checked
the code and couldn't find any other reason execpt from what I
mentioned in my previous mail.  So, let us wait for the results of
your new test run.

> Amit Kapila writes:
>
>> Please find attached patch to fix above code.  Now, if this is the
>> reason of the problem you are seeing, it won't fix your existing
>> database as it already contains some tuples in the wrong bucket.  Can
>> you please re-run the test to see if you can reproduce the problem?
>
> Ok, I'll do testing with the patch applied.
>
> Btw, I also find entries like following in the logging database:
>
> ERROR:  could not read block 2638 in file "base/16384/17256": read only 0 of 
> 8192 bytes
>
> …with relfilenode being an hash index.  I usually ignore these as they
> naturally start occuring after a recovery because of an unrelated crash.
> But since 11003eb, they also occur when the cluster has not yet suffered
> a crash.
>

Hmm, I am not sure if this is related to previous problem, but it
could be.  Is it possible to get the operation and or callstack for
above failure?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts

2016-12-03 Thread Andreas Seltenreich
Amit Kapila writes:

> How should I connect to this database?  If I use the user fdw
> mentioned in pg_hba.conf (changed authentication method to trust in
> pg_hba.conf), it says the user doesn't exist.  Can you create a user
> in the database which I can use?

There is also a superuser "postgres" and an unprivileged user "smith"
you should be able to login with.  You could also start postgres in
single-user mode to bypass the authentication altogether.

Amit Kapila writes:

> Please find attached patch to fix above code.  Now, if this is the
> reason of the problem you are seeing, it won't fix your existing
> database as it already contains some tuples in the wrong bucket.  Can
> you please re-run the test to see if you can reproduce the problem?

Ok, I'll do testing with the patch applied.

Btw, I also find entries like following in the logging database:

ERROR:  could not read block 2638 in file "base/16384/17256": read only 0 of 
8192 bytes

…with relfilenode being an hash index.  I usually ignore these as they
naturally start occuring after a recovery because of an unrelated crash.
But since 11003eb, they also occur when the cluster has not yet suffered
a crash.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts

2016-12-02 Thread Amit Kapila
On Sat, Dec 3, 2016 at 6:58 AM, Amit Kapila  wrote:
> On Sat, Dec 3, 2016 at 2:06 AM, Andreas Seltenreich  
> wrote:
>> Hi,
>>
>> the new hash index code on 11003eb failed an assertion yesterday:
>>
>> TRAP: FailedAssertion("!(bucket == obucket)", File: "hashpage.c", Line: 
>> 1037)
>
> _hash_expandtable(Relation rel, Buffer metabuf)
> {
> ..
> if (H_NEEDS_SPLIT_CLEANUP(oopaque))
> {
> /* Release the metapage lock. */
> _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
>
> hashbucketcleanup(rel, old_bucket, buf_oblkno, start_oblkno, NULL,
>   metap->hashm_maxbucket, metap->hashm_highmask,
>   metap->hashm_lowmask, NULL,
>   NULL, true, NULL, NULL);
> ..
> }
>
> Here we shouldn't be accessing meta page after releasing the lock as
> concurrent activity can change these values.  This can be fixed by
> storing these values in local variables before releasing the lock and
> passing local variables in hashbucketcleanup().  I will send patch
> shortly.
>

Please find attached patch to fix above code.  Now, if this is the
reason of the problem you are seeing, it won't fix your existing
database as it already contains some tuples in the wrong bucket.  Can
you please re-run the test to see if you can reproduce the problem?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_hash_bucketsplit_sqlsmith_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts

2016-12-02 Thread Amit Kapila
On Sat, Dec 3, 2016 at 2:06 AM, Andreas Seltenreich  wrote:
> Hi,
>
> the new hash index code on 11003eb failed an assertion yesterday:
>
> TRAP: FailedAssertion("!(bucket == obucket)", File: "hashpage.c", Line: 
> 1037)
>

This can happen if we start new split before completing the previous
split of a bucket or if there is still any remaining tuples present in
the bucket being from the previous split.  I see a problem in below
code:

_hash_expandtable(Relation rel, Buffer metabuf)
{
..
if (H_NEEDS_SPLIT_CLEANUP(oopaque))
{
/* Release the metapage lock. */
_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);

hashbucketcleanup(rel, old_bucket, buf_oblkno, start_oblkno, NULL,
  metap->hashm_maxbucket, metap->hashm_highmask,
  metap->hashm_lowmask, NULL,
  NULL, true, NULL, NULL);
..
}

Here we shouldn't be accessing meta page after releasing the lock as
concurrent activity can change these values.  This can be fixed by
storing these values in local variables before releasing the lock and
passing local variables in hashbucketcleanup().  I will send patch
shortly.  However, I wanted to verify that this is the reason why you
are seeing the problem.  I could not connect to the database provided
by you.

> Statement was
>
> update public.hash_i4_heap set seqno = public.hash_i4_heap.random;
>
> It can be reproduced with the data directory (Debian stretch amd64) I've
> put here:
>
> http://ansel.ydns.eu/~andreas/_hash_splitbucket_guts.tar.xz (12 MB)
>
> Backtrace below.  The cluster hasn't suffered any crashes before this
> incident.
>

How should I connect to this database?  If I use the user fdw
mentioned in pg_hba.conf (changed authentication method to trust in
pg_hba.conf), it says the user doesn't exist.  Can you create a user
in the database which I can use?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers