[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216672#comment-13216672 ] zhiyuan.dai commented on HBASE-5075: @Jesse Thanks for your reply. I think HBase is a online DB. How long HBase failover takes is very important. Although kill -9 or network partition situation is a big event,the supervisor can judge that it's regionserver has crushed within ms,and hmaster can move regions which opened in the crushed regionserver to other alive regionservers.Therefore, the failover time is reduced to be accepted. As stack and Lars said,shutdownhook is called when the regionserver process is alive and program logic isn't interrupted.The event which is kill -9 can't trigger event that shutdownhook would be called,so the the method deleteMyEphemeralNode would not be executed,in which case we'd need to rely on the ZK timeout. My patch is order to reduce the failover time, which improves the availability of HBase.We have some big online hbase clusters which are all the core applications, and the acceptable failover time of the applications is about 10s~20s which include splitting hlog and recovering hlog lease and 'zk timeout'. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216758#comment-13216758 ] Jimmy Xiang commented on HBASE-3909: @Stack, we don't have to poll fs to find changes. We can just put the lastmodifieddate of the file in ZK. Once the last modified date is changed, we can load the file again. When a new regionserver joins a cluster, it should always try to check if any configuration is changed based on the configuration file last modified date, which is kind of the version number of the file. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.94.0 I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5442) Use builder pattern in StoreFile and HFile
[ https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216776#comment-13216776 ] Phabricator commented on HBASE-5442: Kannan has accepted the revision [jira] [HBASE-5442] [89-fb] Use builder pattern in StoreFile and HFile. looks great! REVISION DETAIL https://reviews.facebook.net/D1941 BRANCH hfile_builder10 Use builder pattern in StoreFile and HFile -- Key: HBASE-5442 URL: https://issues.apache.org/jira/browse/HBASE-5442 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.94.0 Attachments: D1893.1.patch, D1893.2.patch, D1941.1.patch, D1941.2.patch, D1941.3.patch, D1941.4.patch, HFile-StoreFile-builder-2012-02-22_22_49_00.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses StoreFile and HFile refactoring. For HColumnDescriptor refactoring see HBASE-5357. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Priority: Critical There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
[ https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-5480: -- Attachment: HBASE-5480.patch Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Priority: Critical Attachments: HBASE-5480.patch There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
[ https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-5480: -- Attachment: (was: HBASE-5480.patch) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Priority: Critical Attachments: HBASE-5480.patch There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
[ https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-5480: -- Attachment: HBASE-5480.patch Corrected patch with --no-prefix Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Priority: Critical Attachments: HBASE-5480.patch There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
[ https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216861#comment-13216861 ] stack commented on HBASE-5480: -- +1 Looks grand Andy. Reflection is per map invocation? So, per row? I suppose in scheme of things not too bad. Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Priority: Critical Attachments: HBASE-5480.patch There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216863#comment-13216863 ] stack commented on HBASE-5075: -- @zhiyuan.dai What you think of the idea of using supervisor or any of the other babysitting programs instead of writing our own from new? If you need to have hbase regionservers dump out their servername so you know what to kill up in zk, that can be done easy enough regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
[ https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216957#comment-13216957 ] Andrew Purtell commented on HBASE-5480: --- There was a constructor called at that site already, and another constructor called by reflection already above it. This only adds a small incremental cost. Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Priority: Critical Attachments: HBASE-5480.patch There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216989#comment-13216989 ] chunhui shen commented on HBASE-5270: - bq. don't allow it to sever traffic before the actual server is ready. I think it's inconvenient. For example, before fully initialized, we need to allow RegionserverReport but don't allow admin's operation.Also, Server death is found through ZK not RPC. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216994#comment-13216994 ] chunhui shen commented on HBASE-5270: - @stack Could you take a look about introducing safemode to delay SSH after master is initialized. I think this solution is more easier for the issue. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places
[ https://issues.apache.org/jira/browse/HBASE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216997#comment-13216997 ] Luke Lu commented on HBASE-4523: Doesn't look correct to me, as dfs.support.append is required for syncFs detection, independent of hdfs. I'd be fine with syncFs detection to be on all the time, though. dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places Key: HBASE-4523 URL: https://issues.apache.org/jira/browse/HBASE-4523 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0 Reporter: Arpit Gupta Assignee: Eric Yang Fix For: 0.92.1 Attachments: HBASE-4523.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5462) [monitor] Ganglia metric hbase.master.cluster_requests should exclude the scan meta request generated by master, or create a new metric which could show the real request
[ https://issues.apache.org/jira/browse/HBASE-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] johnyang updated HBASE-5462: Affects Version/s: 0.90.5 0.92.0 [monitor] Ganglia metric hbase.master.cluster_requests should exclude the scan meta request generated by master, or create a new metric which could show the real request from client - Key: HBASE-5462 URL: https://issues.apache.org/jira/browse/HBASE-5462 Project: HBase Issue Type: Bug Components: monitoring Affects Versions: 0.90.5, 0.92.0 Environment: hbase 0.90.5 Reporter: johnyang Original Estimate: 48h Remaining Estimate: 48h We have a big table which have 30k regions but the request is not very high (about 50K per day). We use the hbase.master.cluster_request metrics to monitor the cluster request but find that lots of requests is generated by master, which scan the meta table at regular intervals. It is hard for us to monitor the real request from the client, it is possible to filter the scanning meta table or create a new metric which could show the real request from client. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places
[ https://issues.apache.org/jira/browse/HBASE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217010#comment-13217010 ] Luke Lu commented on HBASE-4523: My above comment only apply to 0.90.x, 0.92.x has HBASE-2233, which got rid of dfs.support.append in HBase code. dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places Key: HBASE-4523 URL: https://issues.apache.org/jira/browse/HBASE-4523 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0 Reporter: Arpit Gupta Assignee: Eric Yang Fix For: 0.92.1 Attachments: HBASE-4523.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217014#comment-13217014 ] zhiyuan.dai commented on HBASE-5075: @stack First, thank you. Sorry, I don't quite understand your meaning.Do you means another project instead of writing code into hbase? regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting
Uncaught UnknownHostException prevents HBase from starting -- Key: HBASE-5481 URL: https://issues.apache.org/jira/browse/HBASE-5481 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure If a host gets decommissioned and its hostname no longer resolves, and it was previously hosting ROOT or META, HBase won't be able to start up. This easily happens when moving across networks (e.g. developing HBase on a laptop), but can also happen during cluster-wide maintenances where HBase is shut down, then one or more nodes get decommissioned such that their hostnames no longer resolve. {code} 2012-02-26 20:05:48,339 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087 [...] 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined -ROOT-,,0.70236052; next sequenceid=268 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,459 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,459 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,460 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,466 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy task for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 70236052,}, server: nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,470 INFO org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; deleting unassigned node 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 that is in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region -ROOT-,,0.70236052 has been deleted. 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Successfully deleted unassigned node for region 70236052 in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region -ROOT-,,0.70236052 that was online on nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,473 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- assigned=1, rit=false, location=nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,486 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@16d0a6a3; serverName=nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,488 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@16d0a6a3; serverName=nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,620 FATAL org.apache.hadoop.hbase.master.HMaster: Master
[jira] [Updated] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting
[ https://issues.apache.org/jira/browse/HBASE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated HBASE-5481: -- Status: Patch Available (was: Open) Uncaught UnknownHostException prevents HBase from starting -- Key: HBASE-5481 URL: https://issues.apache.org/jira/browse/HBASE-5481 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Attachments: 0001-Properly-handle-UnknownHostException-when-checking-M.patch If a host gets decommissioned and its hostname no longer resolves, and it was previously hosting ROOT or META, HBase won't be able to start up. This easily happens when moving across networks (e.g. developing HBase on a laptop), but can also happen during cluster-wide maintenances where HBase is shut down, then one or more nodes get decommissioned such that their hostnames no longer resolve. {code} 2012-02-26 20:05:48,339 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087 [...] 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined -ROOT-,,0.70236052; next sequenceid=268 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,459 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,459 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,460 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,466 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy task for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 70236052,}, server: nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,470 INFO org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; deleting unassigned node 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 that is in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region -ROOT-,,0.70236052 has been deleted. 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Successfully deleted unassigned node for region 70236052 in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region -ROOT-,,0.70236052 that was online on nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,473 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- assigned=1, rit=false, location=nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,486 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@16d0a6a3;
[jira] [Updated] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting
[ https://issues.apache.org/jira/browse/HBASE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated HBASE-5481: -- Attachment: 0001-Properly-handle-UnknownHostException-when-checking-M.patch Proposed patch to fix the issue. Uncaught UnknownHostException prevents HBase from starting -- Key: HBASE-5481 URL: https://issues.apache.org/jira/browse/HBASE-5481 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Attachments: 0001-Properly-handle-UnknownHostException-when-checking-M.patch If a host gets decommissioned and its hostname no longer resolves, and it was previously hosting ROOT or META, HBase won't be able to start up. This easily happens when moving across networks (e.g. developing HBase on a laptop), but can also happen during cluster-wide maintenances where HBase is shut down, then one or more nodes get decommissioned such that their hostnames no longer resolve. {code} 2012-02-26 20:05:48,339 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087 [...] 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined -ROOT-,,0.70236052; next sequenceid=268 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,459 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,459 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,460 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,466 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy task for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 70236052,}, server: nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,470 INFO org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; deleting unassigned node 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 that is in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region -ROOT-,,0.70236052 has been deleted. 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Successfully deleted unassigned node for region 70236052 in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region -ROOT-,,0.70236052 that was online on nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,473 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- assigned=1, rit=false, location=nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,486 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location,
[jira] [Commented] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting
[ https://issues.apache.org/jira/browse/HBASE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217025#comment-13217025 ] Hadoop QA commented on HBASE-5481: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516136/0001-Properly-handle-UnknownHostException-when-checking-M.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1051//console This message is automatically generated. Uncaught UnknownHostException prevents HBase from starting -- Key: HBASE-5481 URL: https://issues.apache.org/jira/browse/HBASE-5481 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Attachments: 0001-Properly-handle-UnknownHostException-when-checking-M.patch If a host gets decommissioned and its hostname no longer resolves, and it was previously hosting ROOT or META, HBase won't be able to start up. This easily happens when moving across networks (e.g. developing HBase on a laptop), but can also happen during cluster-wide maintenances where HBase is shut down, then one or more nodes get decommissioned such that their hostnames no longer resolve. {code} 2012-02-26 20:05:48,339 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087 [...] 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined -ROOT-,,0.70236052; next sequenceid=268 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,459 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,459 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,460 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,466 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy task for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 70236052,}, server: nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,470 INFO org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; deleting unassigned node 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 that is in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region -ROOT-,,0.70236052 has been deleted. 2012-02-26 20:05:48,472 DEBUG
[jira] [Commented] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting
[ https://issues.apache.org/jira/browse/HBASE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217032#comment-13217032 ] Zhihong Yu commented on HBASE-5481: --- Patch looks reasonable. But a patch for TRUNK should be generated separately. Uncaught UnknownHostException prevents HBase from starting -- Key: HBASE-5481 URL: https://issues.apache.org/jira/browse/HBASE-5481 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Attachments: 0001-Properly-handle-UnknownHostException-when-checking-M.patch If a host gets decommissioned and its hostname no longer resolves, and it was previously hosting ROOT or META, HBase won't be able to start up. This easily happens when moving across networks (e.g. developing HBase on a laptop), but can also happen during cluster-wide maintenances where HBase is shut down, then one or more nodes get decommissioned such that their hostnames no longer resolve. {code} 2012-02-26 20:05:48,339 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087 [...] 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined -ROOT-,,0.70236052; next sequenceid=268 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-02-26 20:05:48,459 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,459 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,460 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,466 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy task for region=-ROOT-,,0.70236052, daughter=false 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 70236052,}, server: nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT- 2012-02-26 20:05:48,470 INFO org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; deleting unassigned node 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 that is in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region -ROOT-,,0.70236052 has been deleted. 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:54081-0x135bcfbb058 Successfully deleted unassigned node for region 70236052 in expected state RS_ZK_REGION_OPENED 2012-02-26 20:05:48,472 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region -ROOT-,,0.70236052 that was online on nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,473 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- assigned=1, rit=false, location=nowwhat.tsunanet.net,54092,1330315542087 2012-02-26 20:05:48,486 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location,
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217058#comment-13217058 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 as far as I know, it is not possible to obtain a FileSystem object from a FSDataInputStream src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Yes, if we bump the major version to V3, then we can restart minorVersions from 0. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217057#comment-13217057 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 as far as I know, it is not possible to obtain a FileSystem object from a FSDataInputStream src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Yes, if we bump the major version to V3, then we can restart minorVersions from 0. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217091#comment-13217091 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I think it is better to not add another 4 bytes to the HFileBlock (increases heapSize), instead just compute it when needed, especially since this method is used only for debugging. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall we avoid increasing the HeapSize vs computing headerSize? It should be really cheap to compute headerSize(), especially since it is likely to be inlined. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think we should always print this. This follows the precedence in other parts of the HBase code. And this code path is the exception and not the norm src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am pretty sure that it is better to construct this message only if there is a checksum mismatch. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is extracted from the RegionServerServices, if it is not-null. Otherwise, a default file system object is created and passed into HRegion.newHRegion src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() is better because it allows annotating the name differently from what Java does vi toString (especially if we add new crc algorithms in the future) src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would like to keep getName() because it allows us to not change the API if we decide to override java's toString convention, especially if we add new checksum algorithms in the future. (Similar to why there are two separate methods Enum.name and Enum.toString) src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's right. But the existence of this API allows us to do own own names in the future. (Also, when there are only two or three values, this might be better than looking into a map) src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 I am not planning to change that, this code is what was there in HFileBlock, so it is good to carry it over in a unit test to be able to generate files in the older format. This is used by unit tests alone. JUst replacing it with a pre-created file(s) is not very cool, especially because the pre-created file(s) will test only that file whereas if we keep this code here, we can write more and more unit tests in the future that can generate different files in the older format and test backward compatibility. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5074: --- Attachment: D1521.10.patch dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Reviewers: mbautin Addressed most of Stack/Ted/Mikails' comments. Mikhail: I did not change the interfaces of ChecksumType, just because I think what we got is more generic and flexible. Stack: I have been running it successfully with load on a 5 node test cluster for more than 72 hours. Will it be possible for you to take it for a basic sanity test? REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers.
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5074: --- Attachment: D1521.10.patch dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Reviewers: mbautin Addressed most of Stack/Ted/Mikails' comments. Mikhail: I did not change the interfaces of ChecksumType, just because I think what we got is more generic and flexible. Stack: I have been running it successfully with load on a 5 node test cluster for more than 72 hours. Will it be possible for you to take it for a basic sanity test? REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217094#comment-13217094 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I think it is better to not add another 4 bytes to the HFileBlock (increases heapSize), instead just compute it when needed, especially since this method is used only for debugging. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall we avoid increasing the HeapSize vs computing headerSize? It should be really cheap to compute headerSize(), especially since it is likely to be inlined. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think we should always print this. This follows the precedence in other parts of the HBase code. And this code path is the exception and not the norm src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am pretty sure that it is better to construct this message only if there is a checksum mismatch. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is extracted from the RegionServerServices, if it is not-null. Otherwise, a default file system object is created and passed into HRegion.newHRegion src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() is better because it allows annotating the name differently from what Java does vi toString (especially if we add new crc algorithms in the future) src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would like to keep getName() because it allows us to not change the API if we decide to override java's toString convention, especially if we add new checksum algorithms in the future. (Similar to why there are two separate methods Enum.name and Enum.toString) src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's right. But the existence of this API allows us to do own own names in the future. (Also, when there are only two or three values, this might be better than looking into a map) src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 I am not planning to change that, this code is what was there in HFileBlock, so it is good to carry it over in a unit test to be able to generate files in the older format. This is used by unit tests alone. JUst replacing it with a pre-created file(s) is not very cool, especially because the pre-created file(s) will test only that file whereas if we keep this code here, we can write more and more unit tests in the future that can generate different files in the older format and test backward compatibility. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira