Can you stop HBase and run fsck on Hadoop to see how your HDFS health is?
2013/10/24 Vimal Jain <[email protected]> > Hi Ted/Jean, > Can you please help here ? > > > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <[email protected]> wrote: > > > Hi Ted, > > Yes i checked namenode and datanode logs and i found below exceptions in > > both the logs:- > > > > Name node :- > > java.io.IOException: File > > > /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e > > could only be replicated to 0 nodes, instead of 1 > > > > java.io.IOException: Got blockReceived message from unregistered or dead > > node blk_-2949905629769882833_52274 > > > > Data node :- > > 480000 millis timeout while waiting for channel to be ready for write. ch > > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010 > > remote=/192.168.20.30:36188] > > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > > DatanodeRegistration(192.168.20.30:50010, > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237, > infoPort=50075, > > ipcPort=50020):DataXceiver > > > > java.io.EOFException: while trying to read 39309 bytes > > > > > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <[email protected]> wrote: > > > >> bq. java.io.IOException: File /hbase/event_data/ > >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0 > >> could > >> only be replicated to 0 nodes, instead of 1 > >> > >> Have you checked Namenode / Datanode logs ? > >> Looks like hdfs was not stable. > >> > >> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <[email protected]> wrote: > >> > >> > HI Jean, > >> > Thanks for your reply. > >> > I have total 8 GB memory and distribution is as follows:- > >> > > >> > Region server - 2 GB > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB > >> > OS - 1 GB > >> > > >> > Please let me know if you need more information. > >> > > >> > > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari < > >> > [email protected]> wrote: > >> > > >> > > Hi Vimal, > >> > > > >> > > What are your settings? Memory of the host, and memory allocated for > >> the > >> > > different HBase services? > >> > > > >> > > Thanks, > >> > > > >> > > JM > >> > > > >> > > > >> > > 2013/10/22 Vimal Jain <[email protected]> > >> > > > >> > > > Hi, > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop > version - > >> > > 1.1.2 > >> > > > , Hbase version - 0.94.7 ) > >> > > > I am getting few exceptions in both hadoop ( namenode , datanode) > >> logs > >> > > and > >> > > > hbase(region server). > >> > > > When i search for these exceptions on google , i concluded that > >> > problem > >> > > is > >> > > > mainly due to large number of full GC in region server process. > >> > > > > >> > > > I used jstat and found that there are total of 950 full GCs in > span > >> of > >> > 4 > >> > > > days for region server process.Is this ok? > >> > > > > >> > > > I am totally confused by number of exceptions i am getting. > >> > > > Also i get below exceptions intermittently. > >> > > > > >> > > > > >> > > > Region server:- > >> > > > > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer: > >> > > > (responseTooSlow): > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, > 1000), > >> rpc > >> > > > version=1, client version=29, > >> > methodsFingerPrint=-1368823753","client":" > >> > > > 192.168.20.31:48270 > >> > > > > >> > > > > >> > > > >> > > >> > ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"} > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer: > >> > > > (operationTooSlow): {"processingtimems":14759,"client":" > >> > > > 192.168.20.31:48247 > >> > > > > >> > > > > >> > > > >> > > >> > ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1} > >> > > > > >> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient: > >> > > DataStreamer > >> > > > Exception: org.apache.hadoop.ipc.RemoteException: > >> java.io.IOException: > >> > > File > >> > > > > >> > > > > >> > > > >> > > >> > /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0 > >> > > > could only be replicated to 0 nodes, instead of 1 > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) > >> > > > > >> > > > Name node :- > >> > > > java.io.IOException: File > >> > > > > >> > > > > >> > > > >> > > >> > /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e > >> > > > could only be replicated to 0 nodes, instead of 1 > >> > > > > >> > > > java.io.IOException: Got blockReceived message from unregistered > or > >> > dead > >> > > > node blk_-2949905629769882833_52274 > >> > > > > >> > > > Data node :- > >> > > > 480000 millis timeout while waiting for channel to be ready for > >> write. > >> > > ch : > >> > > > java.nio.channels.SocketChannel[connected local=/ > >> 192.168.20.30:50010 > >> > > > remote=/ > >> > > > 192.168.20.30:36188] > >> > > > > >> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > >> > > > DatanodeRegistration( > >> > > > 192.168.20.30:50010, > >> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237, > >> > > infoPort=50075, > >> > > > ipcPort=50020):DataXceiver > >> > > > java.io.EOFException: while trying to read 39309 bytes > >> > > > > >> > > > > >> > > > -- > >> > > > Thanks and Regards, > >> > > > Vimal Jain > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Thanks and Regards, > >> > Vimal Jain > >> > > >> > > > > > > > > -- > > Thanks and Regards, > > Vimal Jain > > > > > > -- > Thanks and Regards, > Vimal Jain >
