On Sun, Aug 7, 2011 at 12:28 PM, Oleg Ruchovets <[email protected]> wrote: > *here is the one of the region server's log : * > http://pastebin.com/raw.php?i=VF2bSMYd >
I see this Oleg: "Caused by: java.lang.OutOfMemoryError: Java heap space" > Additional information and questions: > 1) We disable automatic major compaction and run major compaction manually > ,but from log file I got such log entry: > > 2011-08-07 20:57:20,706 INFO > org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of 5 > file(s), new file=hdfs://hadoop- > master.infolinks.local:8000/hbase/URLS/70c4ed1855cee6201e583662272f7a46/searches/6451756610532158137, > size=6.7m; total size for store is 6.7m > A minor compaction can be promoted to major if it ends up picking all files compacting (see earlier in log it'll start of as an 'ordinary' compaction and then later become a 'major'). > We started major compaction at 00:00 every day but this log entry time > is 20:57:20 , so how can I check that major compaction has been finished? The compaction is async. Currently no flag set on completion. This is an issue we need to figure an answer for. > And what could be a reason for starting > 2) There are a lot of exceptions like this: > 2011-08-07 01:22:34,821 WARN org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 16 on 8041 caught: java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) > at org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083) > > what is this exception mean and is this a normal behaviour. The client has given up listening. Do you see a corresponding timeout around same time on client-side? > 3) There are logs entry like this : > 2011-08-07 17:14:05,833 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition > timed out: URLS, > 20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a. > state=OFFLINE, ts=1312726415824 > 2011-08-07 17:14:05,833 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE > for too long, reassigning URLS, > 20110802_budnmarys.squarespace.com/picture-gallery/miscellaneous-gallery/1138727,1312377360131.e302bc31e326308031a82e9eca6e0b6a. > to a random server > 2011-08-07 17:14:05,833 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition > timed out: URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb. > state=OFFLINE, ts=1312726415824 > 2011-08-07 17:14:05,833 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE > for too long, reassigning > URLS,20110509_e,1305018012046.e48c6df0a31c41f482bcaccf71244ccb. to a random > server > 2011-08-07 17:14:05,833 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition > timed out: URLS,20110731_gg,1312187408164.e7fa3b00af458db5af93d5c475712f62. > state=OFFLINE, ts=1312726415824 > > What does *Regions in transition timed out *means and is it correct > behaviour? Does it go on without ever resolving? If so, this is not usually a good sign. Some recent issues have addressed this with fixes in 0.90.4 which should be out soon (Check its release notes for related issues). St.Ack
