Re: [HACKERS] [sqlsmith] Short reads in hash indexes

2016-12-30 Thread Amit Kapila
On Fri, Dec 30, 2016 at 3:45 AM, Andreas Seltenreich  wrote:
> Amit Kapila writes:
>
>> Can you please try with the patch posted on hash index thread [1] to
>> see if you can reproduce any of these problems?
>>
>> [1] - 
>> https://www.postgresql.org/message-id/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA%40mail.gmail.com
>
> I'm no longer seeing the failed assertions nor short reads since these
> patches are in.
>

Thanks for the confirmation!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Short reads in hash indexes

2016-12-29 Thread Andreas Seltenreich
Amit Kapila writes:

> Can you please try with the patch posted on hash index thread [1] to
> see if you can reproduce any of these problems?
>
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA%40mail.gmail.com

I'm no longer seeing the failed assertions nor short reads since these
patches are in.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Short reads in hash indexes

2016-12-10 Thread Amit Kapila
On Thu, Dec 8, 2016 at 2:38 AM, Andreas Seltenreich  wrote:
> Andreas Seltenreich writes:
>
>> Amit Kapila writes:
>>
>>> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich  
>>> wrote:
 Amit Kapila writes:

> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
 Ok, I'll do testing with the patch applied.
>>
>> Good news: the assertion hasn't fired since the patch is in.
>
> Meh, it fired again today after being silent for 100e6 queries :-/
> I guess I need to add some confidence qualification on such statements.
> Maybe sigmas as they do at CERN…
>

This assertion can be reproduced with Jeff's test as well and the fix
for the same is posted [1].

>> smith=# select * from state_report where sqlstate = 'XX001';
>> -[ RECORD 1 
>> ]--
>> count| 10
>> sqlstate | XX001
>> sample   | ERROR:  could not read block 1173 in file "base/16384/17256": 
>> read only 0 of 8192 bytes
>> hosts| {airbisquit,frell,gorgo,marbit,pillcrow,quakken}
>>
>>> Hmm, I am not sure if this is related to previous problem, but it
>>> could be.  Is it possible to get the operation and or callstack for
>>> above failure?
>>
>> Ok, will turn the elog into an assertion to get at the backtraces.
>
> Doing so on top of 4212cb7, I caught the backtrace below.  Query was:
>
> --8<---cut here---start->8---
> set max_parallel_workers_per_gather = 0;
> select  count(1) from
>public.hash_name_heap as ref_2
>join public.rtest_emplog as sample_1
>   on (ref_2.random = sample_1.who);
> --8<---cut here---end--->8---
>
> I've put the data directory where it can be reproduced here:
>
> http://ansel.ydns.eu/~andreas/hash_index_short_read.tar.xz (12MB)
>

This can happen due to non-marking of the dirty buffer as the index
page where we have deleted the tuples will not be flushed whereas
vacuum would have removed corresponding heap tuples.  Next access to
hash index page will bring back the old copy of index page which
contains tuples that were supposed to get deleted by vacuum and
accessing those tuples will give wrong information about heap tuples
and when we try to access deleted heap tuples, it can give us short
reads problem.

Can you please try with the patch posted on hash index thread [1] to
see if you can reproduce any of these problems?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Short reads in hash indexes

2016-12-07 Thread Amit Kapila
On Thu, Dec 8, 2016 at 2:38 AM, Andreas Seltenreich  wrote:
> Andreas Seltenreich writes:
>
>> Amit Kapila writes:
>>
>>> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich  
>>> wrote:
 Amit Kapila writes:

> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
 Ok, I'll do testing with the patch applied.
>>
>> Good news: the assertion hasn't fired since the patch is in.
>
> Meh, it fired again today after being silent for 100e6 queries :-/
> I guess I need to add some confidence qualification on such statements.
> Maybe sigmas as they do at CERN…
>
>> smith=# select * from state_report where sqlstate = 'XX001';
>> -[ RECORD 1 
>> ]--
>> count| 10
>> sqlstate | XX001
>> sample   | ERROR:  could not read block 1173 in file "base/16384/17256": 
>> read only 0 of 8192 bytes
>> hosts| {airbisquit,frell,gorgo,marbit,pillcrow,quakken}
>>
>>> Hmm, I am not sure if this is related to previous problem, but it
>>> could be.  Is it possible to get the operation and or callstack for
>>> above failure?
>>
>> Ok, will turn the elog into an assertion to get at the backtraces.
>
> Doing so on top of 4212cb7, I caught the backtrace below.  Query was:
>

Thanks for the report, I will look into it.  I think this one is quite
similar to what Jeff has reported [1].

[1] - 
https://www.postgresql.org/message-id/CAMkU%3D1ydfriLCOriJ%3DAxtF%3DhhBOUUcWtf172vquDrj%3D3T7yXmg%40mail.gmail.com


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Short reads in hash indexes

2016-12-07 Thread Andreas Seltenreich
Andreas Seltenreich writes:

> Amit Kapila writes:
>
>> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich  
>> wrote:
>>> Amit Kapila writes:
>>>
 [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
>>> Ok, I'll do testing with the patch applied.
>
> Good news: the assertion hasn't fired since the patch is in.

Meh, it fired again today after being silent for 100e6 queries :-/
I guess I need to add some confidence qualification on such statements.
Maybe sigmas as they do at CERN…

> smith=# select * from state_report where sqlstate = 'XX001';
> -[ RECORD 1 
> ]--
> count| 10
> sqlstate | XX001
> sample   | ERROR:  could not read block 1173 in file "base/16384/17256": read 
> only 0 of 8192 bytes
> hosts| {airbisquit,frell,gorgo,marbit,pillcrow,quakken}
>
>> Hmm, I am not sure if this is related to previous problem, but it
>> could be.  Is it possible to get the operation and or callstack for
>> above failure?
>
> Ok, will turn the elog into an assertion to get at the backtraces.

Doing so on top of 4212cb7, I caught the backtrace below.  Query was:

--8<---cut here---start->8---
set max_parallel_workers_per_gather = 0;
select  count(1) from
   public.hash_name_heap as ref_2
   join public.rtest_emplog as sample_1
  on (ref_2.random = sample_1.who);
--8<---cut here---end--->8---

I've put the data directory where it can be reproduced here:

http://ansel.ydns.eu/~andreas/hash_index_short_read.tar.xz (12MB)

regards,
Andreas

TRAP: FailedAssertion("!(!"short read of block")", File: "md.c", Line: 782)
#2  0x007f7f11 in ExceptionalCondition 
(conditionName=conditionName@entry=0x9a1ae9 "!(!\"short read of block\")", 
errorType=errorType@entry=0x83db3d "FailedAssertion", 
fileName=fileName@entry=0x946a9a "md.c", lineNumber=lineNumber@entry=782) at 
assert.c:54
#3  0x006fb305 in mdread (reln=, forknum=, blocknum=4702, buffer=0x7fe97e7e1280 "\"") at md.c:782
#4  0x006d0ffa in ReadBuffer_common (smgr=0x2af7408, 
relpersistence=, forkNum=forkNum@entry=MAIN_FORKNUM, 
blockNum=blockNum@entry=4702, mode=RBM_NORMAL, strategy=, 
hit=0x7ffde9df11cf "") at bufmgr.c:890
#5  0x006d1a20 in ReadBufferExtended (reln=0x2fd10d8, 
forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=4702, mode=mode@entry=RBM_NORMAL, 
strategy=strategy@entry=0x0) at bufmgr.c:664
#6  0x006d1b74 in ReadBuffer (blockNum=, reln=) at bufmgr.c:596
#7  ReleaseAndReadBuffer (buffer=buffer@entry=87109984, relation=, blockNum=) at bufmgr.c:1540
#8  0x004c047b in index_fetch_heap (scan=scan@entry=0x5313160) at 
indexam.c:469
#9  0x004c05ee in index_getnext (scan=scan@entry=0x5313160, 
direction=direction@entry=ForwardScanDirection) at indexam.c:565
#10 0x005f9b71 in IndexNext (node=node@entry=0x5311c48) at 
nodeIndexscan.c:105
#11 0x005ec492 in ExecScanFetch (recheckMtd=0x5f9af0 , 
accessMtd=0x5f9b30 , node=0x5311c48) at execScan.c:95
#12 ExecScan (node=0x5311c48, accessMtd=0x5f9b30 , 
recheckMtd=0x5f9af0 ) at execScan.c:145
#13 0x005e4da8 in ExecProcNode (node=node@entry=0x5311c48) at 
execProcnode.c:427
#14 0x006014f9 in ExecNestLoop (node=node@entry=0x53110a8) at 
nodeNestloop.c:174
#15 0x005e4cf8 in ExecProcNode (node=node@entry=0x53110a8) at 
execProcnode.c:476
#16 0x00601436 in ExecNestLoop (node=node@entry=0x5310e00) at 
nodeNestloop.c:123
#17 0x005e4cf8 in ExecProcNode (node=node@entry=0x5310e00) at 
execProcnode.c:476
#18 0x00601436 in ExecNestLoop (node=node@entry=0x530f698) at 
nodeNestloop.c:123
#19 0x005e4cf8 in ExecProcNode (node=node@entry=0x530f698) at 
execProcnode.c:476
#20 0x005e0e9e in ExecutePlan (dest=0x603a4a8, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x530f698, estate=0x46bc008) at 
execMain.c:1568
#21 standard_ExecutorRun (queryDesc=0x3475168, direction=, 
count=0) at execMain.c:338
#22 0x007029f8 in PortalRunSelect (portal=portal@entry=0x2561e18, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x603a4a8) at pquery.c:946
#23 0x00703f3e in PortalRun (portal=portal@entry=0x2561e18, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
dest=dest@entry=0x603a4a8, altdest=altdest@entry=0x603a4a8, 
completionTag=completionTag@entry=0x7ffde9df18b0 "") at pquery.c:787
#24 0x00700d5b in exec_simple_query (query_string=0x4685258 ) at 
postgres.c:1094
#25 PostgresMain (argc=, argv=argv@entry=0x256f5a8, 
dbname=0x256f580 "regression", username=) at postgres.c:4069
#26 0x0046daf2 in BackendRun (port=0x25645a0) at postmaster.c:4274
#27 BackendStartup (port=0x25645a0) at postmaster.c:3946
#28 ServerLoop () at postmaster.c:1704
#29 0x00699d28 in PostmasterMain (argc=argc@entry=4,