Thanks, please submit a patch and I can try to test it. Jira is : https://issues.apache.org/jira/browse/HBASE-3722
-----邮件原件----- 发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans 发送时间: 2011年4月1日 1:20 收件人: Gaojinchao; [email protected] 主题: Re: A lot of data is lost when name node crashed (sending back to the list, please don't answer to directly to the sender, always send back to the mailing list) MasterFileSystem has most of DFS interactions, it seems that checkFileSystem is never called (it should be) and splitLog catches the ERROR when splitting but doesn't abort. Would you mind opening a jira about this issue and perhaps submit a patch? Thx, J-D On Thu, Mar 31, 2011 at 5:40 AM, Gaojinchao <[email protected]> wrote: > Thanks, I will try to do it again because last one info log level don't turn > on. > I have a question. > Which code is the Master kill itself when it find namenode crashed? > > if (isCarryingRoot()) { // -ROOT- > try { > this.services.getAssignmentManager().assignRoot(); > } catch (KeeperException e) { > this.server.abort("In server shutdown processing, assigning root", e); > throw new IOException("Aborting", e); > } > } > > -----邮件原件----- > 发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans > 发送时间: 2011年3月30日 1:39 > 收件人: [email protected] > 抄送: Gaojinchao; Chenjian > 主题: Re: A lot of data is lost when name node crashed > > I was expecting it would die, strange it didn't. Could you provide a > bigger log, this one basically tells us the NN is gone but that's > about it. Please put it on a web server or something else that's > easily reachable for anyone (eg don't post the full thing here). > > Thx, > > > J-D > > On Tue, Mar 29, 2011 at 4:28 AM, Gaojinchao <[email protected]> wrote: >> I do some performance test for hbase version 0.90.1 >> when the name node crashed, I find some data lost. >> I'm not sure exactly what arose it. It seems like split logs failed. >> I think the master should shutdown itself when HDFS crashed. >> >> >> The logs is : >> 2011-03-22 13:21:55,056 WARN org.apache.hadoop.hbase.master.LogCleaner: >> Error while cleaning the logs >> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on >> connection exception: java.net.ConnectException: Connection refused >> at org.apache.hadoop.ipc.Client.wrapException(Client.java:844) >> at org.apache.hadoop.ipc.Client.call(Client.java:820) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) >> at $Proxy5.getListing(Unknown Source) >> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) >> at $Proxy5.getListing(Unknown Source) >> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252) >> at >> org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121) >> at org.apache.hadoop.hbase.Chore.run(Chore.java:66) >> at org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154) >> Caused by: java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >> at >> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408) >> at >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332) >> at >> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202) >> at org.apache.hadoop.ipc.Client.getConnection(Client.java:943) >> at org.apache.hadoop.ipc.Client.call(Client.java:788) >> ... 13 more >> 2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 0 time(s). >> 2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 1 time(s). >> 2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 2 time(s). >> 2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 3 time(s). >> 2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 4 time(s). >> 2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 5 time(s). >> 2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 6 time(s). >> 2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 7 time(s). >> 2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 8 time(s). >> 2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 9 time(s). >> 2011-03-22 13:22:05,060 ERROR >> org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting >> hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398 >> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on >> connection exception: java.net.ConnectException: Connection refused >> at org.apache.hadoop.ipc.Client.wrapException(Client.java:844) >> at org.apache.hadoop.ipc.Client.call(Client.java:820) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) >> at $Proxy5.getFileInfo(Unknown Source) >> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) >> at $Proxy5.getFileInfo(Unknown Source) >> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461) >> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690) >> at >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177) >> at >> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196) >> at >> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95) >> at >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> Caused by: java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >> at >> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408) >> at >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332) >> at >> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202) >> at org.apache.hadoop.ipc.Client.getConnection(Client.java:943) >> at org.apache.hadoop.ipc.Client.call(Client.java:788) >> ... 18 more >> 2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 0 time(s). >> 2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 1 time(s). >> 2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 2 time(s). >> 2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 3 time(s). >> 2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 4 time(s). >> 2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 5 time(s). >> 2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 6 time(s). >> 2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 7 time(s). >> 2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 8 time(s). >> 2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect >> to server: C4C1/157.5.100.1:9000. Already tried 9 time(s). >> 2011-03-22 13:22:54,603 WARN org.apache.hadoop.hbase.master.LogCleaner: >> Error while cleaning the logs >> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on >> connection exception: java.net.ConnectException: Connection refused >> at org.apache.hadoop.ipc.Client.wrapException(Client.java:844) >> at org.apache.hadoop.ipc.Client.call(Client.java:820) >> at org.apache.hadoop.ipc.RPC$Invok >> >
