Ming LI created HAWQ-1094:
-----------------------------

             Summary: Select on INTERNAL table returns wrong results when hdfs 
blocks have checksum errors
                 Key: HAWQ-1094
                 URL: https://issues.apache.org/jira/browse/HAWQ-1094
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Fault Tolerance
            Reporter: Ming LI
            Assignee: Lei Chang


I created a parquet table and inserted the following values into the table:

{code}
sr37228_repro=# select * from number;
 id
----
  1
  1
  1
  1
  1
(5 rows)
{code}

I then modified the data in two of the three blocks and tried reading the data 
again.

{code}
Modifying contents of internal table blocks...

Found hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 in hdfs

Modifying block 
/hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
 on 172.28.21.155
block_script.sh                                                                 
                                       100%  228     0.2KB/s   00:00
Modifying block 
/hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
 on 172.28.21.156
block_script.sh                                                                 
                                       100%  228     0.2KB/s   00:00

Running count query again, this time with bad data in two of the three blocks
 count |    id
-------+----------
     1 |        0
     2 |        1
     1 | 16777216
     1 | 16777217
(4 rows)


Checking Showing file health:

Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
Connecting to namenode via 
http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path 
/hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
/hawq_default/16385/16543/17000/10 206 bytes, 1 block(s):  OK
0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206 repl=3 
[DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
 
DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
 
DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]

Status: HEALTHY
 Total size:    206 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 206 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
{code}

When setupBlockReader reads a bad block using the LocalBlockReader, the reader 
correctly detects a bad checksum.

{code}
2016-09-26 13:02:09.267021 
PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager 
discovered local host IPv4 address 127.0.0.1",,,,,,,0,,"network_utils.c",210,
2016-09-26 13:02:09.267171 
PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager 
discovered local host IPv4 address 
172.28.21.155",,,,,,,0,,"network_utils.c",210,
2016-09-26 13:02:16.239048 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping in 
memory mapping OidInMemHeapMapping",,,,,,"SET log_min_messages TO 
'debug5'",0,,"cdbinmemheapam.c",293,
2016-09-26 13:02:16.239289 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransactionCommand",,,,,,"SET
 log_min_messages TO 'debug5'",0,,"postgres.c",3131,
2016-09-26 13:02:16.239435 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransaction",,,,,,"SET
 log_min_messages TO 'debug5'",0,,"xact.c",5103,
2016-09-26 13:02:16.239819 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","name: 
unnamed; blockState:       STARTED; state: INPROGR, xid/subid/cid: 6227/1/0, 
nestlvl: 1, children: <>",,,,,,"SET log_min_messages TO 
'debug5'",0,,"xact.c",5128,
2016-09-26 13:02:16.239978 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping in 
memory mapping OidInMemOnlyMapping",,,,,,"SET log_min_messages TO 
'debug5'",0,,"cdbinmemheapam.c",293,
2016-09-26 13:02:25.600367 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,0,con143,,seg1,,,,,"DEBUG5","00000","First char: 'M'; gp_role = 
'execute'.",,,,,,,0,,"postgres.c",4737,
2016-09-26 13:02:25.600639 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","Message type M received 
by from libpq, len = 1412",,,,,,,0,,"postgres.c",4813,
2016-09-26 13:02:25.600742 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG5","00000","MPP dispatched stmt 
from QD: explain analyze select * from number;.",,,,,,,0,,"postgres.c",4893,
2016-09-26 13:02:25.600847 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","SetupProcessIdentity: 
receive msg: 
ProcessIdentity_Begin_slice_1_idx_0_gang_1_cmd_74_writer_t_End_ProcessIdentity",,,,,,,0,,"identity.c",365,
2016-09-26 13:02:25.600997 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity is not 
init",,,,,,,0,,"identity.c",599,
2016-09-26 13:02:25.601129 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity: slice 
1 id 0 gang num 1 writer t",,,,,,,0,,"identity.c",602,
2016-09-26 13:02:25.601250 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG5","00000","Get a temporary 
directory:/tmp/hawq/segment",,,,,,,0,,"cdbtmpdir.c",48,
2016-09-26 13:02:25.601351 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG1","00000","getLocalTmpDirFromSegmentConfig
 session_id:143 command_id:74 qeidx:0 
tmpdir:/tmp/hawq/segment",,,,,,,0,,"identity.c",418,
2016-09-26 13:02:25.601784 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG3","00000","StartTransactionCommand",,,,,,"explain
 analyze select * from number;",0,,"postgres.c",3107,
2016-09-26 13:02:25.602075 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","StartTransaction",,,,,,"explain
 analyze select * from number;",0,,"xact.c",5103,
2016-09-26 13:02:25.602195 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","name: 
unnamed; blockState:       DEFAULT; state: INPROGR, xid/subid/cid: 6228/1/0, 
nestlvl: 1, children: <>",,,,,,"explain analyze select * from 
number;",0,,"xact.c",5128,
2016-09-26 13:02:25.602578 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 0 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.602703 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 1 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.602836 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 2 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.602994 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 3 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603104 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 4 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603211 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 5 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603317 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 6 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603572 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 7 key 17000 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603751 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 8 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603881 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 9 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604003 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 10 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604110 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 11 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604216 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 12 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604323 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 13 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604555 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 14 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604697 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 15 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604848 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 16 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604959 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 17 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.605064 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add 
index 18 key 17002 relation pg_attribute",,,,,,"explain analyze select * from 
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.605591 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","Resource 
enforcer finds cpu sub-system is disabled",,,,,,"explain analyze select * from 
number;",0,,"resourceenforcer.c",908,
2016-09-26 13:02:25.605716 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Current nice 
level of the process: 19",,,,,,"explain analyze select * from 
number;",0,,"postgres.c",283,
2016-09-26 13:02:25.605856 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Reniced process 
to level 19",,,,,,"explain analyze select * from number;",0,,"postgres.c",302,
2016-09-26 13:02:25.606073 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","GetSnapshotData 
setting globalxmin and xmin to 6228",,,,,,"explain analyze select * from 
number;",0,,"procarray.c",552,
2016-09-26 13:02:25.606306 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Inserted entry 
for query (sessionid=143, commandcnt=74)",,,,,,"explain analyze select * from 
number;",0,,"workfile_queryspace.c",283,
2016-09-26 13:02:25.606748 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Have 
both IPv6 and IPv4 choices",,,,,,"explain analyze select * from 
number;",0,,"ic_udp.c",1291,
2016-09-26 13:02:25.606978 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket 
ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select * from 
number;",0,,"ic_udp.c",1303,
2016-09-26 13:02:25.607098 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket 6 
ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select * from 
number;",0,,"ic_udp.c",1307,
2016-09-26 13:02:25.607207 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","bind 
addrlen 28 fam 10",,,,,,"explain analyze select * from 
number;",0,,"ic_udp.c",1318,
2016-09-26 13:02:25.607320 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
default buffer size 124928 bytes",,,,,,"explain analyze select * from 
number;",0,,"ic_udp.c",2200,
2016-09-26 13:02:25.607555 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit use 
buffer size 2097152 bytes",,,,,,"explain analyze select * from 
number;",0,,"ic_udp.c",2215,
2016-09-26 13:02:25.607678 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit 
default buffer size 124928 bytes",,,,,,"explain analyze select * from 
number;",0,,"ic_udp.c",2200,
2016-09-26 13:02:25.607787 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit use 
buffer size 2097152 bytes",,,,,,"explain analyze select * from 
number;",0,,"ic_udp.c",2215,
2016-09-26 13:02:25.607939 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","GetSockAddr 
socket ai_family 2 ai_socktype 2 ai_protocol 17 for 
172.28.21.157",,,,,,"explain analyze select * from number;",0,,"ic_udp.c",3058,
2016-09-26 13:02:25.608052 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","We are 
inet6, remote is inet.  Converting to v4 mapped address.",,,,,,"explain analyze 
select * from number;",0,,"ic_udp.c",3137,
2016-09-26 13:02:25.608249 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
index 0 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.608706 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
index 1 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.608836 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
index 2 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.608966 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
index 3 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.609083 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
index 4 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.609200 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
index 5 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.609316 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
index 6 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.609657 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read 
index 7 key 17000 for relation pg_attribute",,,,,,"explain analyze select * 
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.613152 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 12:32:31 
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","Parquet metadata 
file footer length index: 198",,,,,,"explain analyze select * from 
number;",0,,"cdbparquetfooterprocessor.c",141,
2016-09-26 13:02:25.676719 
PDT,,,p380675,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error 
log:
2016-09-26 13:02:25.676477, p384452, th140708219193472, ERROR cannot setup 
block reader for Block: [block pool ID: 
BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186] file 
/hawq_default/16385/16543/17000/10 on Datanode: hdw2.hdp.local(172.28.21.155).
LocalBlockReader.cpp: 127: HdfsIOException: Failed to construct 
LocalBlockReader for block: [block pool ID: 
BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
        @       
Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
 const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
        @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
        @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, bool)
        @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
        @       Hdfs::Internal::InputStreamImpl::read(char*, int)
        @       hdfsRead
        @       gpfs_hdfs_read
        @       HdfsRead
        @       FileRead
        @       readParquetFooter
        @       ParquetStorageRead_OpenFile
        @       parquet_getnext
        @       ParquetScanNext
        @       ExecTableScan
        @       ExecProcNode
        @       ExecMotion
        @       ExecProcNode
        @       ExecutePlan
        @       ExecutorRun
        @       PortalRunSelect
        @       PortalRun
        @       PostgresMain
        @       BackendStartup
        @       ServerLoop
        @       PostmasterMain
        @       main
        @       __libc_start_main
        @       Unknown
Caused by
LocalBlockReader.cpp: 283: HdfsIOException: LocalBlockReader failed to skip 
from position: 0, length: 0, block: [block pool ID: 
BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
        @       Hdfs::Internal::LocalBlockReader::skip(long)
        @       
Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
 const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
        @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
        @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, bool)
        @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
        @       Hdfs::Internal::InputStreamImpl::read(char*, int)
        @       hdfsRead
        @       gpfs_hdfs_read
        @       HdfsRead
        @       FileRead
        @       readParquetFooter
        @       ParquetStorageRead_OpenFile
        @       parquet_getnext
        @       ParquetScanNext
        @       ExecTableScan
        @       ExecProcNode
        @       ExecMotion
        @       ExecProcNode
        @       ExecutePlan
        @       ExecutorRun
        @       PortalRunSelect
        @       PortalRun
        @       PostgresMain
        @       BackendStartup
        @       ServerLoop
        @       PostmasterMain
        @       main
        @       __libc_start_main
        @       Unknown
Caused by
LocalBlockReader.cpp: 156: ChecksumException: LocalBlockReader checksum not 
match for block: [block pool ID: BP-2023073008-172.28.21.63-1462922052672 block 
ID 1073742008_1186]
        @       Hdfs::Internal::LocalBlockReader::readAndVerify(int)
        @       Hdfs::Internal::LocalBlockReader::skip(long)
        @       
Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
 const&, Hdfs::Internal::ExtendedBlock const&, long, bool, 
Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
        @       Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
        @       Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, bool)
        @       Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
        @       Hdfs::Internal::InputStreamImpl::read(char*, int)
        @       hdfsRead
        @       gpfs_hdfs_read
        @       HdfsRead
        @       FileRead
        @       readParquetFooter
        @       ParquetStorageRead_OpenFile
        @       parquet_getnext
        @       ParquetScanNext
        @       ExecTableScan
        @       ExecProcNode
        @       ExecMotion
        @       ExecProcNode
        @       ExecutePlan
        @       ExecutorRun
        @       PortalRunSelect
        @       PortalRun
        @       PostgresMain
        @       BackendStartup
        @       ServerLoop
        @       PostmasterMain
        @       main
        @       __libc_start_main
        @       Unknown

retry the same node but disable read shortcircuit 
feature",,,,,,,,"SysLoggerMain","syslogger.c",518,
2016-09-26 13:02:25.680638 
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
 
{code}

Even though it correctly detected the bad checksum using the LocalBlockReader, 
when it calls the RemoteBlockReader it does not appear to detect the bad 
checksum, and the read is allowed to go through.

{code}
sr37228_repro=# select * from number;
    id
----------
 16777217
 16777216
        0
        1
        1
(5 rows)

Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health

Connecting to namenode via 
http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path 
/hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
/hawq_default/16385/16543/17000/10 206 bytes, 1 block(s):  OK
0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206 repl=3 
[DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
 
DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
 
DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]

Status: HEALTHY
 Total size:    206 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 206 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds


The filesystem under path '/hawq_default/16385/16543/17000/10' is HEALTHY
{code}

The behavior of InputStreamImpl::setupBlockReader appears to be to: 

1. Attempt to read the block locally using LocalBlockReader
2. If the local block read fails, attempt to read the block from the next 
available node using RemoteBlockReader
3. Continue to read all the available blocks using RemoteBlockReader until we 
have no more blocks to read.

In this case, the RemoteBlockReader appears to ignore the bad checksum in the 
block, and returns wrong results.

Questions:

1. When we detect a bad checksum on the local block, why do we not mark the 
block as corrupt with the NameNode?
2. When we read the block using RemoteBlockReader, why doesn't it detect the 
bad block?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to