Ok. That's fine. I will say, don't be surprised to have such failures with a single pseudo-distributed node with only 4GB. Go slowly with it, don't start big jobs. And if something failed, monitor memory usage, CPU usage, etc.
JM 2013/8/13 Vimal Jain <[email protected]> > Yes. that is the reason i am planning to make it 8GB. > I am running in pseudo distributed mode as i will expand to 2-3 node > cluster as my data size increases in future. > Also , i have disabled swapping ( have set vm.swappiness to 0). > > > On Tue, Aug 13, 2013 at 9:12 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > Hi Vimal, > > > > 4GB for all the process is very short... You might want to run in > > standalone mode instead of pseudo-distributed. That will save you some > > memory. Have you checked if you server is swapping? That will make is > slow, > > and then you will miss some heartbits and processes will close... > > > > JM > > > > 2013/8/13 Vimal Jain <[email protected]> > > > > > Hi Jean-Marc, > > > Thanks for your reply. > > > I have one node cluster(pseudo distributed mode) , so 192.168.20.30 is > > the > > > only server which hosts all 6 processes ( > > > namenode,datanode,secondarynamenode,Hmaster,HRegion and Zookeeper). > > > > > > At time of this problem, i had given following memory to these > processes > > ( > > > I had total 4 GB memory on this box): > > > > > > NameNode,DataNode,SecondaryNamenode - 1 GB ( default) > > > HMaster,Zookeeper - 512MB > > > HRegion-1.5 GB > > > > > > So total memory allocation is 5.5 GB and it exceeds physical memory.( I > > am > > > planning to increase RAM on my system to 8GB and change memory > allocation > > > as 2 GB to HRegion and 1 GB to other 5 ). > > > So could that be a problem ? > > > I am talking about memory stuff as i see following in Hmaster log. > > > > > > *org.apache.hadoop.hbase.util.Sleeper: We slept 1266874891120ms instead > > of > > > 10000ms, this is likely due to a long garbage collecting pause and it's > > > usually bad* > > > * > > > * > > > > > > > > > > > > On Tue, Aug 13, 2013 at 6:42 PM, Jean-Marc Spaggiari < > > > [email protected]> wrote: > > > > > > > Hi Vimal, > > > > > > > > What was your cluster doing at that time? Was it very busy? Looks > like > > > one > > > > server (192.168.20.30 <http://192.168.20.30:50010>) went so busy > that > > it > > > > failed to report active and closed. > > > > > > > > JM > > > > > > > > 2013/8/13 Vimal Jain <[email protected]> > > > > > > > > > Hi, > > > > > I am using Hadoop and Hbase in pseudo distributed mode. > > > > > I am using Hadoop version - 1.1.2 and Hbase version - 0.94.7 > > > > > > > > > > Recently i found some exception in hadoop and hbase logs. > > > > > I am not sure what has caused this. > > > > > Requesting you to please help here. > > > > > > > > > > > > > > > *Exception in Master log :* > > > > > 2013-07-31 15:51:04,694 WARN org.apache.hadoop.hbase.util. > > > > > Sleeper: We slept 1266874891120ms instead of 10000ms, this is > likely > > > due > > > > to > > > > > a long garbage c > > > > > ollecting pause and it's usually bad, see > > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > 2013-07-31 15:51:04,798 WARN > > > > org.apache.hadoop.hbase.master.CatalogJanitor: > > > > > Failed scan of catalog table > > > > > org.apache.hadoop.hbase.client.ScannerTimeoutException: 82253ms > > passed > > > > > since the last invocation, timeout is currently set to 60000 > > > > > at > > > > > > > > > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:283) > > > > > at > > > > > > > > > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:727) > > > > > at > > > > > > > > > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:184) > > > > > at > > > > > > > > > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:169) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:123) > > > > > at > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:134) > > > > > at > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:92) > > > > > at org.apache.hadoop.hbase.Chore.run(Chore.java:67) > > > > > at java.lang.Thread.run(Thread.java:662) > > > > > Caused by: org.apache.hadoop.hbase.UnknownScannerException: > > > > > org.apache.hadoop.hbase.UnknownScannerException: Name: > > > > -8839286818925700393 > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2544) > > > > > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) > > > > > at > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) > > > > > > > > > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > > > > Method) > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > > > > at > > java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:143) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:42) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:164) > > > > > at > > > > > > > > > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:274) > > > > > ... 8 more > > > > > 2013-07-31 15:54:42,526 DEBUG > > > > org.apache.hadoop.hbase.client.ClientScanner: > > > > > Creating scanner over .META. starting at key '' > > > > > 2013-07-31 15:54:42,526 DEBUG > > > > org.apache.hadoop.hbase.client.ClientScanner: > > > > > Advancing internal scanner to startKey at '' > > > > > 2013-07-31 15:54:42,531 DEBUG > > > > org.apache.hadoop.hbase.client.ClientScanner: > > > > > Finished with scanning at {NAME => '.META.,,1', STARTKEY => '', > > ENDKEY > > > => > > > > > '', ENCODED => 1028785192,} > > > > > 2013-07-31 15:54:42,532 DEBUG > > > > > org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 5 catalog > > row(s) > > > > and > > > > > gc'd 0 unreferenced parent region(s) > > > > > 2013-07-31 15:54:42,751 INFO > > > org.apache.hadoop.hbase.master.LoadBalancer: > > > > > Skipping load balancing because balanced cluster; servers=1 > regions=5 > > > > > averag > > > > > e=5.0 mostloaded=5 leastloaded=5 > > > > > > > > > > 2013-07-31 16:43:23,358 WARN org.apache.hadoop.hbase.util.Sleeper: > We > > > > slept > > > > > 25771ms instead of 1000ms, this is likely due to a long garbage > > > > collecting > > > > > pause and it's usually bad, see > > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > 2013-07-31 16:43:23,358 WARN org.apache.hadoop.hbase.util.Sleeper: > We > > > > slept > > > > > 30091ms instead of 1000ms, this is likely due to a long garbage > > > > collecting > > > > > pause and it's usually bad, see > > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > 2013-07-31 16:43:23,361 WARN org.apache.hadoop.hbase.util.Sleeper: > We > > > > slept > > > > > 28613ms instead of 10000ms, this is likely due to a long garbage > > > > collectin > > > > > g pause and it's usually bad, see > > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > 2013-07-31 16:43:23,361 WARN org.apache.hadoop.hbase.util.Sleeper: > We > > > > slept > > > > > 27457ms instead of 10000ms, this is likely due to a long garbage > > > > collectin > > > > > g pause and it's usually bad, see > > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > 2013-07-31 16:43:23,362 WARN org.apache.hadoop.hbase.util.Sleeper: > We > > > > slept > > > > > 34587ms instead of 10000ms, this is likely due to a long garbage > > > > collectin > > > > > g pause and it's usually bad, see > > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > 2013-07-31 16:43:23,367 WARN org.apache.hadoop.hbase.util.Sleeper: > We > > > > slept > > > > > 78600ms instead of 60000ms, this is likely due to a long garbage > > > > collectin > > > > > g pause and it's usually bad, see > > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > 2013-07-31 16:43:23,369 WARN org.apache.hadoop.hbase.util.Sleeper: > We > > > > slept > > > > > 78562ms instead of 60000ms, this is likely due to a long garbage > > > > collectin > > > > > g pause and it's usually bad, see > > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > > > > > > > > > > > *Exception in Region log : > > > > > * > > > > > java.io.IOException: Reflection > > > > > at org.apache.hadoop.hbase. > > > > > > > > > > > > > > > regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:304) > > > > > at > > > > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1375) > > > > > at > > > > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1319) > > > > > at > > > org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1480) > > > > > at > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1271) > > > > > at java.lang.Thread.run(Thread.java:662) > > > > > Caused by: java.lang.reflect.InvocationTargetException > > > > > at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown > Source) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:302) > > > > > ... 5 more > > > > > Caused by: java.io.IOException: DFSOutputStream is closed > > > > > at > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3669) > > > > > at > > > > > > > > > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > > > > > at > > > > > > > org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:995) > > > > > ... 9 more > > > > > 2013-07-31 15:50:37,761 FATAL > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. > > > Requesting > > > > > close of hlog > > > > > > > > > > 2013-07-31 19:23:38,289 INFO org.apache.hadoop.hdfs.DFSClient: > Could > > > not > > > > > complete file /hbase/.logs/hbase.xyz.com,60020,1370497170634/ > > > > hbase.xyz.com > > > > > %2C60020%2C1370497170634.1375265949987 > > > > > retrying... > > > > > 2013-07-31 19:23:38,289 INFO org.apache.hadoop.hdfs.DFSClient: > Could > > > not > > > > > complete file /hbase/.logs/hbase.xyz.com,60020,1370497170634/ > > > > hbase.xyz.com > > > > > %2C60020%2C1370497170634.1375265949987 > > > > > retrying... > > > > > > > > > > > > > > > *Exception in data node :-* > > > > > > > > > > 2013-07-31 19:39:51,094 WARN org.apache.hadoop.hdfs.server. > > > > > datanode.DataNode: org.apache.hadoop.ipc.RemoteException: > > > > > java.io.IOException: Got blockRec > > > > > eived message from unregistered or dead node > > > > blk_-4787262105551508952_28369 > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:4188) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:1069) > > > > > at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) > > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) > > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) > > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > > at javax.security.auth.Subject.doAs(Subject.java:396) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) > > > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) > > > > > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1107) > > > > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > > > > > at sun.proxy.$Proxy5.blockReceived(Unknown Source) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1006) > > > > > at > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1527) > > > > > at java.lang.Thread.run(Thread.java:662) > > > > > > > > > > *Exception in name node :- * > > > > > > > > > > 2013-07-31 19:39:50,671 WARN org.apache.hadoop.hdfs. > > > > > StateChange: BLOCK* NameSystem.blockReceived: > > > > > blk_-4787262105551508952_28369 is received from dead > > > > > or unregistered node 192.168.20.30:50010 > > > > > 2013-07-31 19:39:50,671 ERROR > > > > > org.apache.hadoop.security.UserGroupInformation: > > > > PriviledgedActionException > > > > > as:hadoop cause:java.io.IOException: Got blo > > > > > ckReceived message from unregistered or dead node > > > > > blk_-4787262105551508952_28369 > > > > > 2013-07-31 19:39:50,671 INFO org.apache.hadoop.ipc.Server: IPC > Server > > > > > handler 5 on 9000, call blockReceived(DatanodeRegistration( > > > > > 192.168.20.30:50010, > > > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237, > > > > infoPort=50075, > > > > > ipcPort=50020), [Lorg.apache.hadoop.hdfs.protocol.Block;@64f2d559, > > > > > [Ljava.l > > > > > ang.String;@294f9d6) from 192.168.20.30:59764: error: > > > > java.io.IOException: > > > > > Got blockReceived message from unregistered or dead node > > > > blk_-4787262105551 > > > > > 508952_28369 > > > > > java.io.IOException: Got blockReceived message from unregistered or > > > dead > > > > > node blk_-4787262105551508952_28369 > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:4188) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:1069) > > > > > at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) > > > > > at > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) > > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) > > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) > > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > > at javax.security.auth.Subject.doAs(Subject.java:396) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) > > > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Thanks and Regards, > > > > > Vimal Jain > > > > > > > > > > > > > > > > > > > > > -- > > > Thanks and Regards, > > > Vimal Jain > > > > > > > > > -- > Thanks and Regards, > Vimal Jain >
