That's quite horrible, oh well, thanks for the help! Yes, positive, we started having issues with HA quorum a couple of days after the migration, HBase has constantly been taking ~200 requests a second via stargate, things seemed to work fine.
Mateusz On 3 February 2015 at 22:11, Jean-Marc Spaggiari <[email protected]> wrote: > Those files and related data are most probably lost.... I don't see any > other option than deleting them. > > Are you sure those blocks where not missing before the migration? Did you > have any crash over the migration process? > > JM > > 2015-02-03 13:14 GMT-08:00 Ellimilial K <[email protected]>: > > > Thank you for the responses! > > > > @Jean-Mark > > This comes from fsck /, I see a flood of those going in at least > hundreds, > > for this particular region: > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9: > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block > > blk_1076062948 > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/18c428413d7b4a89959911c9112a6eb9: > > MISSING 1 blocks of total size 52243482 B.. > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362: > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block > > blk_1076077963 > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/extracted/49b265ba5c7942b0b8e2b788fd9d7362: > > MISSING 1 blocks of total size 6181 B... > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451: > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block > > blk_1076062891 > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/ef3fc67a835b451aa7d18094ea141451: > > MISSING 1 blocks of total size 11747149 B.. > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b: > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block > > blk_1076077964 > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/pipeline/fedeb8062c454238bf1d1112b0f80b4b: > > MISSING 1 blocks of total size 10431742 B.. > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109: > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block > > blk_1076062900 > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109: > > MISSING 1 blocks of total size 929610 B... > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127: > > CORRUPT blockpool BP-2037521063-37.59.17.102-1418127576413 block > > blk_1076077966 > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127: > > MISSING 1 blocks of total size 119139 B......... > > (...) ending with: > > ..........Status: CORRUPT > > Total size: 23155170955674 B (Total open files size: 1577 B) > > Total dirs: 21232 > > Total files: 33311 > > Total symlinks: 0 (Files currently being written: 61) > > Total blocks (validated): 199618 (avg. block size 115997409 B) (Total > open > > file blocks (not validated): 19) > > ******************************** > > CORRUPT FILES: 8245 > > MISSING BLOCKS: 8245 > > MISSING SIZE: 162010861748 B > > CORRUPT BLOCKS: 8245 > > ******************************** > > Minimally replicated blocks: 191373 (95.86961 %) > > Over-replicated blocks: 3241 (1.6236011 %) > > Under-replicated blocks: 0 (0.0 %) > > Mis-replicated blocks: 0 (0.0 %) > > Default replication factor: 3 > > Average block replication: 2.916185 > > Corrupt blocks: 8245 > > Missing replicas: 0 (0.0 %) > > Number of data-nodes: 17 > > Number of racks: 1 > > > > There are 8 files in directories within > > hbase/data/default/table/ffa95306f599dbff99497e71841724fe so I imagine > 6/8 > > is affected. > > The size of missing blocks differs from 2kb up to ~ 70MB. The table > > concerned had ~3500 regions. All datanodes are up and look like they > report > > correctly so unfortunately no replica lying around. > > > > @esteban I double checked, the volumes seem fine, total HDFS size also > > looks unchanged. Datanodes look fine. It is a single cluster (i.e. no > > cluster replication if I'm answering the question?),freshly after an > > upgrade to 0.98 from 0.94 (or CDH 4.7 to 5.3), with HDFS replication set > to > > 3. > > > > Many thanks, > > Mateusz > > > > On 3 February 2015 at 20:30, Esteban Gutierrez <[email protected]> > > wrote: > > > > > Hi Mateusz, > > > > > > As JMS mentioned, is very likely the data is lost, but that type of > > > corruption is usually due some DNs down or data volumes removed for > some > > > reason, have you tried to recover that data from those DNs first? > > > > > > From "for what looks like a continuous stream of regions" sounds like > you > > > had a single replica configured for HBase is that the case? > > > > > > esteban. > > > > > > -- > > > Cloudera, Inc. > > > > > > > > > On Tue, Feb 3, 2015 at 12:04 PM, Jean-Marc Spaggiari < > > > [email protected]> wrote: > > > > > > > Hi Mateusz, > > > > > > > > Data from this HFile is most probably lost. Is the block also > reporting > > > > missing from fsck? Do you have any datanode down which might contain > > this > > > > block? How big is tis HFile? 929610 bytes only? If so, one option > might > > > > just to to delete this HFile. > > > > > > > > How many HFiles are within this region? > > > > > > > > JM > > > > > > > > 2015-02-03 10:04 GMT-08:00 Ellimilial K <[email protected]>: > > > > > > > > > We have recently experienced some issues with our namenodes in HA > > > > > arrangement and had to recreate namenode metadata from a backup > while > > > > some > > > > > new data has been pushed to the regions ervers in the meantime. > We're > > > on > > > > > HBase 98.6. > > > > > > > > > > After launching the cluster again, we have realised that we're > > missing > > > > > ~8000/190000 blocks. Looking at fsck output, we can see, for what > > looks > > > > > like a continuous stream of regions: > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/35186fe43fed47989ddb4ace3648b109: > > > > > MISSING 1 blocks of total size 929610 B... > > > > > > > > > > > > > > > > > > > > /hbase/data/default/table/ffa95306f599dbff99497e71841724fe/processed/bd41ca895f3749188c08dd2e540bc127: > > > > > CORRUPT blockpool BP-2037521063-<IP>-1418127576413 block > > blk_1076077966 > > > > > > > > > > I did not want to run fsck -delete and hbck complains because the > > files > > > > > would not be allocated to region servers - reporting missing > blocks. > > > > > > > > > > The total size of this table is circa 22TB on HDFS and recreating > it > > > > would > > > > > be quite a drag (pushing it from our previous hbase cluster took > > about > > > a > > > > > month). Is there any known way of dealing with such situation? > > > > > > > > > > Mateusz KaczyĆski > > > > > > > > > > > > > > >
