Thank you for the responses! @Jean-Mark This comes from fsck /, I see a flood of those going in at least hundreds, for this particular region: /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9: CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block blk_1076062948 /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9: MISSING 1 blocks of total size 52243482 B.. /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362: CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block blk_1076077963 /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362: MISSING 1 blocks of total size 6181 B... /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451: CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block blk_1076062891 /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451: MISSING 1 blocks of total size 11747149 B.. /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b: CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block blk_1076077964 /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b: MISSING 1 blocks of total size 10431742 B.. /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109: CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block blk_1076062900 /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109: MISSING 1 blocks of total size 929610 B... /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127: CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block blk_1076077966 /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127: MISSING 1 blocks of total size 119139 B......... (...) ending with: ..........Status: CORRUPT Total size: 23155170955674 B (Total open files size: 1577 B) Total dirs: 21232 Total files: 33311 Total symlinks: 0 (Files currently being written: 61) Total blocks (validated): 199618 (avg. block size 115997409 B) (Total open file blocks (not validated): 19) ******************************** CORRUPT FILES: 8245 MISSING BLOCKS: 8245 MISSING SIZE: 162010861748 B CORRUPT BLOCKS: 8245 ******************************** Minimally replicated blocks: 191373 (95.86961 %) Over-replicated blocks: 3241 (1.6236011 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.916185 Corrupt blocks: 8245 Missing replicas: 0 (0.0 %) Number of data-nodes: 17 Number of racks: 1
There are 8 files in directories within hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I imagine 6/8 is affected. The size of missing blocks differs from 2kb up to ~ 70MB. The table concerned had ~3500 regions. All datanodes are up and look like they report correctly so unfortunately no replica lying around. @esteban I double checked, the volumes seem fine, total HDFS size also looks unchanged. Datanodes look fine. It is a single cluster (i.e. no cluster replication if I'm answering the question?),freshly after an upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication set to 3. Many thanks, Mateusz On 3 February 2015 at 20:30, Esteban Gutierrez <[email protected]> wrote: > Hi Mateusz, > > As JMS mentioned, is very likely the data is lost, but that type of > corruption is usually due some DNs down or data volumes removed for some > reason, have you tried to recover that data from those DNs first? > > From "for what looks like a continuous stream of regions" sounds like you > had a single replica configured for HBase is that the case? > > esteban. > > -- > Cloudera, Inc. > > > On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > Hi Mateusz, > > > > Data from this HFile is most probably lost. Is the block also reporting > > missing from fsck? Do you have any datanode down which might contain this > > block? How big is tis HFile? 929610 bytes only? If so, one option might > > just to to delete this HFile. > > > > How many HFiles are within this region? > > > > JM > > > > 2015-02-03 10:04 GMT-08:00 Ellimilial K <[email protected]>: > > > > > We have recently experienced some issues with our namenodes in HA > > > arrangement and had to recreate namenode metadata from a backup while > > some > > > new data has been pushed to the regions ervers in the meantime. We're > on > > > HBase 98.6. > > > > > > After launching the cluster again, we have realised that we're missing > > > ~8000/190000 blocks. Looking at fsck output, we can see, for what looks > > > like a continuous stream of regions: > > > > > > > > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109: > > > MISSING 1 blocks of total size 929610 B... > > > > > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127: > > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block blk_1076077966 > > > > > > I did not want to run fsck -delete and hbck complains because the files > > > would not be allocated to region servers - reporting missing blocks. > > > > > > The total size of this table is circa 22TB on HDFS and recreating it > > would > > > be quite a drag (pushing it from our previous hbase cluster took about > a > > > month). Is there any known way of dealing with such situation? > > > > > > Mateusz KaczyĆski > > > > > >
