Yes. that is the reason i am planning to make it 8GB. I am running in pseudo distributed mode as i will expand to 2-3 node cluster as my data size increases in future. Also , i have disabled swapping ( have set vm.swappiness to 0).
On Tue, Aug 13, 2013 at 9:12 PM, Jean-Marc Spaggiari < [email protected]> wrote: > Hi Vimal, > > 4GB for all the process is very short... You might want to run in > standalone mode instead of pseudo-distributed. That will save you some > memory. Have you checked if you server is swapping? That will make is slow, > and then you will miss some heartbits and processes will close... > > JM > > 2013/8/13 Vimal Jain <[email protected]> > > > Hi Jean-Marc, > > Thanks for your reply. > > I have one node cluster(pseudo distributed mode) , so 192.168.20.30 is > the > > only server which hosts all 6 processes ( > > namenode,datanode,secondarynamenode,Hmaster,HRegion and Zookeeper). > > > > At time of this problem, i had given following memory to these processes > ( > > I had total 4 GB memory on this box): > > > > NameNode,DataNode,SecondaryNamenode - 1 GB ( default) > > HMaster,Zookeeper - 512MB > > HRegion-1.5 GB > > > > So total memory allocation is 5.5 GB and it exceeds physical memory.( I > am > > planning to increase RAM on my system to 8GB and change memory allocation > > as 2 GB to HRegion and 1 GB to other 5 ). > > So could that be a problem ? > > I am talking about memory stuff as i see following in Hmaster log. > > > > *org.apache.hadoop.hbase.util.Sleeper: We slept 1266874891120ms instead > of > > 10000ms, this is likely due to a long garbage collecting pause and it's > > usually bad* > > * > > * > > > > > > > > On Tue, Aug 13, 2013 at 6:42 PM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > > > Hi Vimal, > > > > > > What was your cluster doing at that time? Was it very busy? Looks like > > one > > > server (192.168.20.30 <http://192.168.20.30:50010>) went so busy that > it > > > failed to report active and closed. > > > > > > JM > > > > > > 2013/8/13 Vimal Jain <[email protected]> > > > > > > > Hi, > > > > I am using Hadoop and Hbase in pseudo distributed mode. > > > > I am using Hadoop version - 1.1.2 and Hbase version - 0.94.7 > > > > > > > > Recently i found some exception in hadoop and hbase logs. > > > > I am not sure what has caused this. > > > > Requesting you to please help here. > > > > > > > > > > > > *Exception in Master log :* > > > > 2013-07-31 15:51:04,694 WARN org.apache.hadoop.hbase.util. > > > > Sleeper: We slept 1266874891120ms instead of 10000ms, this is likely > > due > > > to > > > > a long garbage c > > > > ollecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-07-31 15:51:04,798 WARN > > > org.apache.hadoop.hbase.master.CatalogJanitor: > > > > Failed scan of catalog table > > > > org.apache.hadoop.hbase.client.ScannerTimeoutException: 82253ms > passed > > > > since the last invocation, timeout is currently set to 60000 > > > > at > > > > > > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:283) > > > > at > > > > > > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:727) > > > > at > > > > > > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:184) > > > > at > > > > > > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:169) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:123) > > > > at > > > > > > > > > > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:134) > > > > at > > > > > > > > > > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:92) > > > > at org.apache.hadoop.hbase.Chore.run(Chore.java:67) > > > > at java.lang.Thread.run(Thread.java:662) > > > > Caused by: org.apache.hadoop.hbase.UnknownScannerException: > > > > org.apache.hadoop.hbase.UnknownScannerException: Name: > > > -8839286818925700393 > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2544) > > > > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) > > > > > > > > at > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) > > > > at > > > > > > > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) > > > > > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > > > Method) > > > > at > > > > > > > > > > > > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > > > > at > > > > > > > > > > > > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > > > at > java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:143) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:42) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:164) > > > > at > > > > > > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:274) > > > > ... 8 more > > > > 2013-07-31 15:54:42,526 DEBUG > > > org.apache.hadoop.hbase.client.ClientScanner: > > > > Creating scanner over .META. starting at key '' > > > > 2013-07-31 15:54:42,526 DEBUG > > > org.apache.hadoop.hbase.client.ClientScanner: > > > > Advancing internal scanner to startKey at '' > > > > 2013-07-31 15:54:42,531 DEBUG > > > org.apache.hadoop.hbase.client.ClientScanner: > > > > Finished with scanning at {NAME => '.META.,,1', STARTKEY => '', > ENDKEY > > => > > > > '', ENCODED => 1028785192,} > > > > 2013-07-31 15:54:42,532 DEBUG > > > > org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 5 catalog > row(s) > > > and > > > > gc'd 0 unreferenced parent region(s) > > > > 2013-07-31 15:54:42,751 INFO > > org.apache.hadoop.hbase.master.LoadBalancer: > > > > Skipping load balancing because balanced cluster; servers=1 regions=5 > > > > averag > > > > e=5.0 mostloaded=5 leastloaded=5 > > > > > > > > 2013-07-31 16:43:23,358 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept > > > > 25771ms instead of 1000ms, this is likely due to a long garbage > > > collecting > > > > pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-07-31 16:43:23,358 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept > > > > 30091ms instead of 1000ms, this is likely due to a long garbage > > > collecting > > > > pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-07-31 16:43:23,361 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept > > > > 28613ms instead of 10000ms, this is likely due to a long garbage > > > collectin > > > > g pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-07-31 16:43:23,361 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept > > > > 27457ms instead of 10000ms, this is likely due to a long garbage > > > collectin > > > > g pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-07-31 16:43:23,362 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept > > > > 34587ms instead of 10000ms, this is likely due to a long garbage > > > collectin > > > > g pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-07-31 16:43:23,367 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept > > > > 78600ms instead of 60000ms, this is likely due to a long garbage > > > collectin > > > > g pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-07-31 16:43:23,369 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept > > > > 78562ms instead of 60000ms, this is likely due to a long garbage > > > collectin > > > > g pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > > > > > > > > *Exception in Region log : > > > > * > > > > java.io.IOException: Reflection > > > > at org.apache.hadoop.hbase. > > > > > > > > > > regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:304) > > > > at > > > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1375) > > > > at > > > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1319) > > > > at > > org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1480) > > > > at > > > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1271) > > > > at java.lang.Thread.run(Thread.java:662) > > > > Caused by: java.lang.reflect.InvocationTargetException > > > > at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source) > > > > > > > > at > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:302) > > > > ... 5 more > > > > Caused by: java.io.IOException: DFSOutputStream is closed > > > > at > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3669) > > > > at > > > > > > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > > > > at > > > > > org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:995) > > > > ... 9 more > > > > 2013-07-31 15:50:37,761 FATAL > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. > > Requesting > > > > close of hlog > > > > > > > > 2013-07-31 19:23:38,289 INFO org.apache.hadoop.hdfs.DFSClient: Could > > not > > > > complete file /hbase/.logs/hbase.xyz.com,60020,1370497170634/ > > > hbase.xyz.com > > > > %2C60020%2C1370497170634.1375265949987 > > > > retrying... > > > > 2013-07-31 19:23:38,289 INFO org.apache.hadoop.hdfs.DFSClient: Could > > not > > > > complete file /hbase/.logs/hbase.xyz.com,60020,1370497170634/ > > > hbase.xyz.com > > > > %2C60020%2C1370497170634.1375265949987 > > > > retrying... > > > > > > > > > > > > *Exception in data node :-* > > > > > > > > 2013-07-31 19:39:51,094 WARN org.apache.hadoop.hdfs.server. > > > > datanode.DataNode: org.apache.hadoop.ipc.RemoteException: > > > > java.io.IOException: Got blockRec > > > > eived message from unregistered or dead node > > > blk_-4787262105551508952_28369 > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:4188) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:1069) > > > > at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) > > > > at > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > at javax.security.auth.Subject.doAs(Subject.java:396) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) > > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) > > > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1107) > > > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > > > > at sun.proxy.$Proxy5.blockReceived(Unknown Source) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1006) > > > > at > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1527) > > > > at java.lang.Thread.run(Thread.java:662) > > > > > > > > *Exception in name node :- * > > > > > > > > 2013-07-31 19:39:50,671 WARN org.apache.hadoop.hdfs. > > > > StateChange: BLOCK* NameSystem.blockReceived: > > > > blk_-4787262105551508952_28369 is received from dead > > > > or unregistered node 192.168.20.30:50010 > > > > 2013-07-31 19:39:50,671 ERROR > > > > org.apache.hadoop.security.UserGroupInformation: > > > PriviledgedActionException > > > > as:hadoop cause:java.io.IOException: Got blo > > > > ckReceived message from unregistered or dead node > > > > blk_-4787262105551508952_28369 > > > > 2013-07-31 19:39:50,671 INFO org.apache.hadoop.ipc.Server: IPC Server > > > > handler 5 on 9000, call blockReceived(DatanodeRegistration( > > > > 192.168.20.30:50010, > > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237, > > > infoPort=50075, > > > > ipcPort=50020), [Lorg.apache.hadoop.hdfs.protocol.Block;@64f2d559, > > > > [Ljava.l > > > > ang.String;@294f9d6) from 192.168.20.30:59764: error: > > > java.io.IOException: > > > > Got blockReceived message from unregistered or dead node > > > blk_-4787262105551 > > > > 508952_28369 > > > > java.io.IOException: Got blockReceived message from unregistered or > > dead > > > > node blk_-4787262105551508952_28369 > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:4188) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:1069) > > > > at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) > > > > at > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) > > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > at javax.security.auth.Subject.doAs(Subject.java:396) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) > > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks and Regards, > > > > Vimal Jain > > > > > > > > > > > > > > > -- > > Thanks and Regards, > > Vimal Jain > > > -- Thanks and Regards, Vimal Jain
