-------- Forwarded Message --------
Subject: Re: Problem during compacting a table
Date: Wed, 05 Aug 2015 11:24:28 -0400
From: Josh Elser <[email protected]>
Reply-To: [email protected]
To: [email protected]
I'm not really sure what that error message means without doing more
digging. Copying your email to [email protected] might shed some
light on what the error means if you want to try that.
mohit.kaushik wrote:
There errors are shown in logs of Hadoop namenode and slaves...
*Namenode**log*
/2015-08-05 12:05:14,518 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment
at 391508
2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask
192.168.10.121:50010 to replicate blk_1073780327_39560 to datanode(s)
192.168.10.122:50010
2015-08-05 12:05:14,664 INFO BlockStateChange: BLOCK* ask
192.168.10.121:50010 to replicate blk_1073780379_39612 to datanode(s)
192.168.10.122:50010
2015-08-05 12:05:24,621 INFO BlockStateChange: BLOCK* addStoredBlock:
blockMap updated: 192.168.10.122:50010 is added to blk_1073782847_42080
size 134217728
2015-08-05 12:05:26,665 INFO BlockStateChange: BLOCK* ask
192.168.10.121:50010 to replicate blk_1073780611_39844 to datanode(s)
192.168.10.122:50010
2015-08-05 12:05:27,232 INFO BlockStateChange: BLOCK* addStoredBlock:
blockMap updated: 192.168.10.122:50010 is added to blk_1073793941_53178
size 134217728
2015-08-05 12:05:27,950 INFO BlockStateChange: BLOCK* addStoredBlock:
blockMap updated: 192.168.10.122:50010 is added to blk_1073783859_43092
size 134217728
2015-08-05 12:05:28,798 INFO BlockStateChange: BLOCK* addStoredBlock:
blockMap updated: 192.168.10.122:50010 is added to blk_1073793387_52620
size 22496
2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask
192.168.10.123:50010 to replicate blk_1073780678_39911 to datanode(s)
192.168.10.121:50010
2015-08-05 12:05:29,666 INFO BlockStateChange: BLOCK* ask
192.168.10.121:50010 to replicate blk_1073780682_39915 to datanode(s)
192.168.10.122:50010
2015-08-05 12:05:32,002 INFO BlockStateChange: BLOCK* addStoredBlock:
blockMap updated: 192.168.10.122:50010 is added to
blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
primaryNodeIndex=-1,
replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],
ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],
ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}
size 0
2015-08-05 12:05:32,072 INFO BlockStateChange: BLOCK* addStoredBlock:
blockMap updated: 192.168.10.121:50010 is added to
blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
primaryNodeIndex=-1,
replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],
ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],
ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}
size 0
2015-08-05 12:05:32,129 INFO BlockStateChange: BLOCK* addStoredBlock:
blockMap updated: 192.168.10.123:50010 is added to
blk_1073796582_55826{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
primaryNodeIndex=-1,
replicas=[ReplicaUC[[DISK]DS-896dada5-52c0-4a69-beed-dfbc5d437fc6:NORMAL:192.168.10.123:50010|RBW],
ReplicaUC[[DISK]DS-dd6d6a25-122f-4958-a20b-4ccb82f49f11:NORMAL:192.168.10.121:50010|RBW],
ReplicaUC[[DISK]DS-188489f9-89d3-40bd-9d20-9db358d644c9:NORMAL:192.168.10.122:50010|RBW]]}
size 0/................and more
*Slave log **(too many)*
/k_1073794728_53972 on DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because
the block scanner is disabled.
2015-08-05 11:50:30,438 INFO
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
suspicious block
BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
disabled.
2015-08-05 11:50:31,024 INFO
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
suspicious block
BP-2102462487-192.168.10.124-1436956492274:blk_1073794728_53972 on
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
disabled.
2015-08-05 11:50:31,027 INFO
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
suspicious block
BP-2102462487-192.168.10.124-1436956492274:blk_1073794738_53982 on
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
disabled.
2015-08-05 11:50:31,095 INFO
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
suspicious block
BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
disabled.
2015-08-05 11:50:31,105 INFO
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
suspicious block
BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
disabled.
2015-08-05 11:50:31,136 INFO
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
suspicious block
BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
disabled.
2015-08-05 11:50:31,136 INFO
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Not scanning
suspicious block
BP-2102462487-192.168.10.124-1436956492274:blk_1073794740_53984 on
DS-896dada5-52c0-4a69-beed-dfbc5d437fc6, because the block scanner is
disabled./
I am using locality groups so its a *NEED* to compact tables.... plz
explain how can I get rid of suspicious blocks.
Thanks
On 08/05/2015 10:53 AM, mohit.kaushik wrote:
yes, One of my datanode was down because disk was detached for some
time and tserver was lost for that node but Its Up and running again.
fsck show that the file system is healthy. but with so many msgs
reporting under replicated blocks while my replication factor is 3 it
shows required is 5.
//user/root/.Trash/Current/accumulo/tables/+r/root_tablet/delete+A0000d29.rf+F0000d28.rf:
Under replicated
BP-2102462487-192.168.10.124-1436956492274:blk_1073796198_55442.
Target Replicas is 5 but found 3 replica(s).///
Thanks & Regards
Mohit Kaushik
On 08/04/2015 09:18 PM, John Vines wrote:
It looks like an hdfs issue. Did a datanode go down? Did you turn
replication down to 1? The combination of those two errors would
definitely cause the problems your seeing as the latter disables any
sort of robustness of the underlying filesystem.
On Tue, Aug 4, 2015 at 8:10 AM mohit.kaushik
<[email protected] <mailto:[email protected]>> wrote:
On 08/04/2015 05:35 PM, mohit.kaushik wrote:
Hello All,
I am using Apache Accumulo-1.6.3 with Apache Hadoop-2.7.0 on a 3
node cluster. when I give compact command from the shell it
gives the folloing warn.
root@orkash testScan> compact -w
2015-08-04 17:10:52,702 [Shell.audit] INFO : root@orkash
testScan> compact -w
2015-08-04 17:10:52,706 [shell.Shell] INFO : Compacting table ...
2015-08-04 17:12:53,986 [impl.ThriftTransportPool] *WARN :
Thread "shell" stuck on IO to orkash4:9999 (0) for at least
120034 ms*
Tablet Servers show problem regarding a data block. which is
something like HDFS-8659
<https://issues.apache.org/jira/browse/HDFS-8659>
/2015-08-04 15:00:27,825 [hdfs.DFSClient] WARN : Failed to
connect to /192.168.10.121:50010 <http://192.168.10.121:50010>
for block, add to deadNodes and continue. java.io.IOException:
Got error, status message opReadBlock
BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
received exception
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException:
Replica not found for
BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911,
for OP_READ_BLOCK, self=/192.168.10.121:38752
<http://192.168.10.121:38752>, remote=/192.168.10.121:50010
<http://192.168.10.121:50010>, for file
/accumulo/tables/h/t-000016s/F000016t.rf, for pool
BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
//java.io.IOException: Got error, status message opReadBlock
BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911
received exception
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException:
Replica not found for
BP-2102462487-192.168.10.124-1436956492274:blk_1073780678_39911,
for OP_READ_BLOCK, self=/192.168.10.121:38752
<http://192.168.10.121:38752>, remote=/192.168.10.121:50010
<http://192.168.10.121:50010>, for file
/accumulo/tables/h/t-000016s/F000016t.rf, for pool
BP-2102462487-192.168.10.124-1436956492274 block 1073780678_39911//
// at
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)//
// at
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)//
// at
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)//
// at
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:814)//
// at
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)//
// at
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:352)//
// at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)//
// at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)//
// at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)//
// at
java.io.DataInputStream.read(DataInputStream.java:149)//
// at
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:104)//
// at
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream$1.run(BoundedRangeFileInputStream.java:100)//
// at java.security.AccessController.doPrivileged(Native
Method)//
// at
org.apache.accumulo.core.file.rfile.bcfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:100)//
// at
org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)//
// at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)//
// at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)//
// at
java.io.BufferedInputStream.fill(BufferedInputStream.java:235)//
// at
java.io.BufferedInputStream.read(BufferedInputStream.java:254)//
// at
java.io.FilterInputStream.read(FilterInputStream.java:83)//
// at
java.io.DataInputStream.readInt(DataInputStream.java:387)//
// at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$IndexBlock.readFields(MultiLevelIndex.java:269)//
// at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.getIndexBlock(MultiLevelIndex.java:724)//
// at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader.access$100(MultiLevelIndex.java:497)//
// at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNext(MultiLevelIndex.java:587)//
// at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$Node.getNextNode(MultiLevelIndex.java:593)//
// at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.getNextNode(MultiLevelIndex.java:616)//
// at
org.apache.accumulo.core.file.rfile.MultiLevelIndex$Reader$IndexIterator.next(MultiLevelIndex.java:659)//
// at
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._next(RFile.java:559)/
Regards
Mohit Kaushik
**
And Compaction never completes