date:20120226

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-26 Thread zhiyuan.dai (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216672#comment-13216672
 ] 

zhiyuan.dai commented on HBASE-5075:


@Jesse
Thanks for your reply.
I think HBase is a online DB. How long HBase failover takes is very important. 
Although kill -9 or network partition situation is a big event,the supervisor 
can judge that it's regionserver has crushed within ms,and hmaster can move 
regions which opened in the crushed regionserver to other alive 
regionservers.Therefore, the failover time is reduced to be accepted.

As stack and Lars said,shutdownhook is called when the regionserver process is 
alive and program logic isn't interrupted.The event which is kill -9 can't 
trigger event that shutdownhook would be called,so the the method 
deleteMyEphemeralNode would not be executed,in which case we'd need to rely on 
the ZK timeout.

My patch is order to reduce the failover time, which improves the availability 
of HBase.We have some big online hbase clusters which are all the core 
applications, and the acceptable failover time of the applications is about 
10s~20s which include splitting hlog and recovering hlog lease and 'zk timeout'.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-26 Thread Jimmy Xiang (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216758#comment-13216758
 ] 

Jimmy Xiang commented on HBASE-3909:


@Stack,  we don't have to poll fs to find changes. We can just put the 
lastmodifieddate of the file in ZK.  Once the last modified date is changed, we 
can load the file again.

When a new regionserver joins a cluster, it should always try to check if any 
configuration is changed based on the configuration file last modified
date, which is kind of the version number of the file.


 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5442) Use builder pattern in StoreFile and HFile

2012-02-26 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216776#comment-13216776
 ] 

Phabricator commented on HBASE-5442:


Kannan has accepted the revision [jira] [HBASE-5442] [89-fb] Use builder 
pattern in StoreFile and HFile.

  looks great!

REVISION DETAIL
  https://reviews.facebook.net/D1941

BRANCH
  hfile_builder10


 Use builder pattern in StoreFile and HFile
 --

 Key: HBASE-5442
 URL: https://issues.apache.org/jira/browse/HBASE-5442
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.94.0

 Attachments: D1893.1.patch, D1893.2.patch, D1941.1.patch, 
 D1941.2.patch, D1941.3.patch, D1941.4.patch, 
 HFile-StoreFile-builder-2012-02-22_22_49_00.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses StoreFile and HFile refactoring. For 
 HColumnDescriptor refactoring see HBASE-5357.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-02-26 Thread Andrew Purtell (Created) (JIRA)

Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
-

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Priority: Critical


There are two issues:

- StatusReporter has a new method getProgress()

- Mapper and reducer context objects can no longer be directly instantiated.

See attached patch. I'm not thrilled with the added reflection but it was the 
minimally intrusive change.

Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-02-26 Thread Andrew Purtell (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-5480:
--

Attachment: HBASE-5480.patch

 Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
 -

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Priority: Critical
 Attachments: HBASE-5480.patch


 There are two issues:
 - StatusReporter has a new method getProgress()
 - Mapper and reducer context objects can no longer be directly instantiated.
 See attached patch. I'm not thrilled with the added reflection but it was the 
 minimally intrusive change.
 Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-02-26 Thread Andrew Purtell (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-5480:
--

Attachment: (was: HBASE-5480.patch)

 Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
 -

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Priority: Critical
 Attachments: HBASE-5480.patch


 There are two issues:
 - StatusReporter has a new method getProgress()
 - Mapper and reducer context objects can no longer be directly instantiated.
 See attached patch. I'm not thrilled with the added reflection but it was the 
 minimally intrusive change.
 Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-02-26 Thread Andrew Purtell (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-5480:
--

Attachment: HBASE-5480.patch

Corrected patch with --no-prefix

 Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
 -

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Priority: Critical
 Attachments: HBASE-5480.patch


 There are two issues:
 - StatusReporter has a new method getProgress()
 - Mapper and reducer context objects can no longer be directly instantiated.
 See attached patch. I'm not thrilled with the added reflection but it was the 
 minimally intrusive change.
 Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-02-26 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216861#comment-13216861
 ] 

stack commented on HBASE-5480:
--

+1  Looks grand Andy.  Reflection is per map invocation?  So, per row?  I 
suppose in scheme of things not too bad.

 Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
 -

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Priority: Critical
 Attachments: HBASE-5480.patch


 There are two issues:
 - StatusReporter has a new method getProgress()
 - Mapper and reducer context objects can no longer be directly instantiated.
 See attached patch. I'm not thrilled with the added reflection but it was the 
 minimally intrusive change.
 Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-26 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216863#comment-13216863
 ] 

stack commented on HBASE-5075:
--

@zhiyuan.dai What you think of the idea of using supervisor or any of the other 
babysitting programs instead of writing our own from new?   If you need to have 
hbase regionservers dump out their servername so you know what to kill up in 
zk, that can be done easy enough

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-02-26 Thread Andrew Purtell (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216957#comment-13216957
 ] 

Andrew Purtell commented on HBASE-5480:
---

There was a constructor called at that site already, and another constructor 
called by reflection already above it. This only adds a small incremental cost. 


 Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
 -

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Priority: Critical
 Attachments: HBASE-5480.patch


 There are two issues:
 - StatusReporter has a new method getProgress()
 - Mapper and reducer context objects can no longer be directly instantiated.
 See attached patch. I'm not thrilled with the added reflection but it was the 
 minimally intrusive change.
 Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-26 Thread chunhui shen (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216989#comment-13216989
]

chunhui shen commented on HBASE-5270:
-

bq. don't allow it to sever traffic before the actual server is ready.
I think it's inconvenient. For example, before fully initialized, we need to
allow RegionserverReport but don't allow admin's operation.Also, Server death
is found through ZK not RPC.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch,
5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch,
5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch,
hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt

This JIRA continues the effort from HBASE-5179. Starting with Stack's
comments about patches for 0.92 and TRUNK:
Reviewing 0.92v17
isDeadServerInProgress is a new public method in ServerManager but it does
not seem to be used anywhere.
Does isDeadRootServerInProgress need to be public? Ditto for meta version.
This method param names are not right 'definitiveRootServer'; what is meant
by definitive? Do they need this qualifier?
Is there anything in place to stop us expiring a server twice if its carrying
root and meta?
What is difference between asking assignment manager isCarryingRoot and this
variable that is passed in? Should be doc'd at least. Ditto for meta.
I think I've asked for this a few times - onlineServers needs to be
explained... either in javadoc or in comment. This is the param passed into
joinCluster. How does it arise? I think I know but am unsure. God love the
poor noob that comes awandering this code trying to make sense of it all.
It looks like we get the list by trawling zk for regionserver znodes that
have not checked in. Don't we do this operation earlier in master setup? Are
we doing it again here?
Though distributed split log is configured, we will do in master single
process splitting under some conditions with this patch. Its not explained in
code why we would do this. Why do we think master log splitting 'high
priority' when it could very well be slower. Should we only go this route if
distributed splitting is not going on. Do we know if concurrent distributed
log splitting and master splitting works?
Why would we have dead servers in progress here in master startup? Because a
servershutdownhandler fired?
This patch is different to the patch for 0.90. Should go into trunk first
with tests, then 0.92. Should it be in this issue? This issue is really hard
to follow now. Maybe this issue is for 0.90.x and new issue for more work on
this trunk patch?
This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-26 Thread chunhui shen (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216994#comment-13216994
]

chunhui shen commented on HBASE-5270:
-

@stack
Could you take a look about introducing safemode to delay SSH after master is
initialized.
I think this solution is more easier for the issue.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

[jira] [Commented] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places

2012-02-26 Thread Luke Lu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216997#comment-13216997
 ] 

Luke Lu commented on HBASE-4523:


Doesn't look correct to me, as dfs.support.append is required for syncFs 
detection, independent of hdfs. I'd be fine with syncFs detection to be on all 
the time, though. 

 dfs.support.append config should be present in the hadoop configs, we should 
 remove them from hbase so the user is not confused when they see the config 
 in 2 places
 

 Key: HBASE-4523
 URL: https://issues.apache.org/jira/browse/HBASE-4523
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0
Reporter: Arpit Gupta
Assignee: Eric Yang
 Fix For: 0.92.1

 Attachments: HBASE-4523.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5462) [monitor] Ganglia metric hbase.master.cluster_requests should exclude the scan meta request generated by master, or create a new metric which could show the real request

2012-02-26 Thread johnyang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

johnyang updated HBASE-5462:


Affects Version/s: 0.90.5
   0.92.0

 [monitor] Ganglia metric hbase.master.cluster_requests should exclude the 
 scan meta request generated by master, or create a new metric which could 
 show the real request from client
 -

 Key: HBASE-5462
 URL: https://issues.apache.org/jira/browse/HBASE-5462
 Project: HBase
  Issue Type: Bug
  Components: monitoring
Affects Versions: 0.90.5, 0.92.0
 Environment: hbase 0.90.5
Reporter: johnyang
   Original Estimate: 48h
  Remaining Estimate: 48h

 We have a big table which have 30k regions but the request is not very high 
 (about 50K per day).
 We use the hbase.master.cluster_request metrics to monitor the cluster 
 request but find that lots of requests is generated by master, which scan the 
 meta table at regular intervals.
 It is hard for us to monitor the real request from the client, it is possible 
 to filter the scanning meta table or create a new metric which could show the 
 real request from client.
 Thank you.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places

2012-02-26 Thread Luke Lu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217010#comment-13217010
 ] 

Luke Lu commented on HBASE-4523:


My above comment only apply to 0.90.x, 0.92.x has HBASE-2233, which got rid of 
dfs.support.append in HBase code.

 dfs.support.append config should be present in the hadoop configs, we should 
 remove them from hbase so the user is not confused when they see the config 
 in 2 places
 

 Key: HBASE-4523
 URL: https://issues.apache.org/jira/browse/HBASE-4523
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0
Reporter: Arpit Gupta
Assignee: Eric Yang
 Fix For: 0.92.1

 Attachments: HBASE-4523.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-26 Thread zhiyuan.dai (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217014#comment-13217014
 ] 

zhiyuan.dai commented on HBASE-5075:


@stack
First, thank you.
Sorry, I don't quite understand your meaning.Do you means another project 
instead of writing code into hbase?


 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

2012-02-26 Thread Benoit Sigoure (Created) (JIRA)

Uncaught UnknownHostException prevents HBase from starting
--

 Key: HBASE-5481
 URL: https://issues.apache.org/jira/browse/HBASE-5481
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure


If a host gets decommissioned and its hostname no longer resolves, and it was 
previously hosting ROOT or META, HBase won't be able to start up.  This easily 
happens when moving across networks (e.g. developing HBase on a laptop), but 
can also happen during cluster-wide maintenances where HBase is shut down, then 
one or more nodes get decommissioned such that their hostnames no longer 
resolve.

{code}
2012-02-26 20:05:48,339 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087
[...]
2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Onlined -ROOT-,,0.70236052; next sequenceid=268
2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
2012-02-26 20:05:48,459 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
2012-02-26 20:05:48,459 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for 
region=-ROOT-,,0.70236052, daughter=false
2012-02-26 20:05:48,460 INFO 
org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087
2012-02-26 20:05:48,466 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy 
task for region=-ROOT-,,0.70236052, daughter=false
2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2012-02-26 20:05:48,468 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region 
transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', 
ENDKEY = '', ENCODED = 70236052,}, server: 
nowwhat.tsunanet.net,54092,1330315542087
2012-02-26 20:05:48,468 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
-ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087
2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, 
server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
2012-02-26 20:05:48,470 INFO 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; 
deleting unassigned node
2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 
that is in expected state RS_ZK_REGION_OPENED
2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
The znode of region -ROOT-,,0.70236052 has been deleted.
2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:54081-0x135bcfbb058 Successfully deleted unassigned node for region 
70236052 in expected state RS_ZK_REGION_OPENED
2012-02-26 20:05:48,472 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
The master has opened the region -ROOT-,,0.70236052 that was online on 
nowwhat.tsunanet.net,54092,1330315542087
2012-02-26 20:05:48,473 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- 
assigned=1, rit=false, location=nowwhat.tsunanet.net,54092,1330315542087
2012-02-26 20:05:48,486 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Lookedup root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@16d0a6a3;
 serverName=nowwhat.tsunanet.net,54092,1330315542087
2012-02-26 20:05:48,488 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Lookedup root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@16d0a6a3;
 serverName=nowwhat.tsunanet.net,54092,1330315542087
2012-02-26 20:05:48,620 FATAL org.apache.hadoop.hbase.master.HMaster: Master

[jira] [Updated] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

2012-02-26 Thread Benoit Sigoure (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure updated HBASE-5481:
--

Status: Patch Available  (was: Open)

 Uncaught UnknownHostException prevents HBase from starting
 --

 Key: HBASE-5481
 URL: https://issues.apache.org/jira/browse/HBASE-5481
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
 Attachments: 
 0001-Properly-handle-UnknownHostException-when-checking-M.patch


 If a host gets decommissioned and its hostname no longer resolves, and it was 
 previously hosting ROOT or META, HBase won't be able to start up.  This 
 easily happens when moving across networks (e.g. developing HBase on a 
 laptop), but can also happen during cluster-wide maintenances where HBase is 
 shut down, then one or more nodes get decommissioned such that their 
 hostnames no longer resolve.
 {code}
 2012-02-26 20:05:48,339 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087
 [...]
 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Onlined -ROOT-,,0.70236052; next sequenceid=268
 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2012-02-26 20:05:48,459 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
 2012-02-26 20:05:48,459 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks 
 for region=-ROOT-,,0.70236052, daughter=false
 2012-02-26 20:05:48,460 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
 location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,466 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for region=-ROOT-,,0.70236052, daughter=false
 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region 
 transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', 
 ENDKEY = '', ENCODED = 70236052,}, server: 
 nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
 -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, 
 server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
 2012-02-26 20:05:48,470 INFO 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; 
 deleting unassigned node
 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 
 that is in expected state RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,472 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 -ROOT-,,0.70236052 has been deleted.
 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:54081-0x135bcfbb058 Successfully deleted unassigned node for 
 region 70236052 in expected state RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,472 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region -ROOT-,,0.70236052 that was online on 
 nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,473 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- 
 assigned=1, rit=false, location=nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,486 DEBUG 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
 Lookedup root region location, 
 connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@16d0a6a3;

[jira] [Updated] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

2012-02-26 Thread Benoit Sigoure (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure updated HBASE-5481:
--

Attachment: 0001-Properly-handle-UnknownHostException-when-checking-M.patch

Proposed patch to fix the issue.

 Uncaught UnknownHostException prevents HBase from starting
 --

 Key: HBASE-5481
 URL: https://issues.apache.org/jira/browse/HBASE-5481
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
 Attachments: 
 0001-Properly-handle-UnknownHostException-when-checking-M.patch


 If a host gets decommissioned and its hostname no longer resolves, and it was 
 previously hosting ROOT or META, HBase won't be able to start up.  This 
 easily happens when moving across networks (e.g. developing HBase on a 
 laptop), but can also happen during cluster-wide maintenances where HBase is 
 shut down, then one or more nodes get decommissioned such that their 
 hostnames no longer resolve.
 {code}
 2012-02-26 20:05:48,339 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087
 [...]
 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Onlined -ROOT-,,0.70236052; next sequenceid=268
 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2012-02-26 20:05:48,459 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
 2012-02-26 20:05:48,459 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks 
 for region=-ROOT-,,0.70236052, daughter=false
 2012-02-26 20:05:48,460 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
 location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,466 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for region=-ROOT-,,0.70236052, daughter=false
 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region 
 transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', 
 ENDKEY = '', ENCODED = 70236052,}, server: 
 nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
 -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, 
 server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
 2012-02-26 20:05:48,470 INFO 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; 
 deleting unassigned node
 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 
 that is in expected state RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,472 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 -ROOT-,,0.70236052 has been deleted.
 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:54081-0x135bcfbb058 Successfully deleted unassigned node for 
 region 70236052 in expected state RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,472 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region -ROOT-,,0.70236052 that was online on 
 nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,473 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- 
 assigned=1, rit=false, location=nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,486 DEBUG 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
 Lookedup root region location,

[jira] [Commented] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

2012-02-26 Thread Hadoop QA (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217025#comment-13217025
 ] 

Hadoop QA commented on HBASE-5481:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12516136/0001-Properly-handle-UnknownHostException-when-checking-M.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1051//console

This message is automatically generated.

 Uncaught UnknownHostException prevents HBase from starting
 --

 Key: HBASE-5481
 URL: https://issues.apache.org/jira/browse/HBASE-5481
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
 Attachments: 
 0001-Properly-handle-UnknownHostException-when-checking-M.patch


 If a host gets decommissioned and its hostname no longer resolves, and it was 
 previously hosting ROOT or META, HBase won't be able to start up.  This 
 easily happens when moving across networks (e.g. developing HBase on a 
 laptop), but can also happen during cluster-wide maintenances where HBase is 
 shut down, then one or more nodes get decommissioned such that their 
 hostnames no longer resolve.
 {code}
 2012-02-26 20:05:48,339 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087
 [...]
 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Onlined -ROOT-,,0.70236052; next sequenceid=268
 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2012-02-26 20:05:48,459 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
 2012-02-26 20:05:48,459 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks 
 for region=-ROOT-,,0.70236052, daughter=false
 2012-02-26 20:05:48,460 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
 location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,466 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for region=-ROOT-,,0.70236052, daughter=false
 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region 
 transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', 
 ENDKEY = '', ENCODED = 70236052,}, server: 
 nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
 -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, 
 server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
 2012-02-26 20:05:48,470 INFO 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; 
 deleting unassigned node
 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 
 that is in expected state RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,472 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 -ROOT-,,0.70236052 has been deleted.
 2012-02-26 20:05:48,472 DEBUG

[jira] [Commented] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

2012-02-26 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217032#comment-13217032
 ] 

Zhihong Yu commented on HBASE-5481:
---

Patch looks reasonable.
But a patch for TRUNK should be generated separately.

 Uncaught UnknownHostException prevents HBase from starting
 --

 Key: HBASE-5481
 URL: https://issues.apache.org/jira/browse/HBASE-5481
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
 Attachments: 
 0001-Properly-handle-UnknownHostException-when-checking-M.patch


 If a host gets decommissioned and its hostname no longer resolves, and it was 
 previously hosting ROOT or META, HBase won't be able to start up.  This 
 easily happens when moving across networks (e.g. developing HBase on a 
 laptop), but can also happen during cluster-wide maintenances where HBase is 
 shut down, then one or more nodes get decommissioned such that their 
 hostnames no longer resolve.
 {code}
 2012-02-26 20:05:48,339 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 -ROOT-,,0.70236052 to nowwhat.tsunanet.net,54092,1330315542087
 [...]
 2012-02-26 20:05:48,456 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Onlined -ROOT-,,0.70236052; next sequenceid=268
 2012-02-26 20:05:48,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2012-02-26 20:05:48,458 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2012-02-26 20:05:48,459 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
 2012-02-26 20:05:48,459 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks 
 for region=-ROOT-,,0.70236052, daughter=false
 2012-02-26 20:05:48,460 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
 location in ZooKeeper as nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,466 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for region=-ROOT-,,0.70236052, daughter=false
 2012-02-26 20:05:48,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Attempting to transition node 
 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:54092-0x135bcfbb0580001 Successfully transitioned node 70236052 
 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: region 
 transitioned to opened in zookeeper: {NAME = '-ROOT-,,0', STARTKEY = '', 
 ENDKEY = '', ENCODED = 70236052,}, server: 
 nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
 -ROOT-,,0.70236052 on server:nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,468 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, 
 server=nowwhat.tsunanet.net,54092,1330315542087, region=70236052/-ROOT-
 2012-02-26 20:05:48,470 INFO 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for -ROOT-,,0.70236052 from nowwhat.tsunanet.net,54092,1330315542087; 
 deleting unassigned node
 2012-02-26 20:05:48,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:54081-0x135bcfbb058 Deleting existing unassigned node for 70236052 
 that is in expected state RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,472 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 -ROOT-,,0.70236052 has been deleted.
 2012-02-26 20:05:48,472 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:54081-0x135bcfbb058 Successfully deleted unassigned node for 
 region 70236052 in expected state RS_ZK_REGION_OPENED
 2012-02-26 20:05:48,472 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region -ROOT-,,0.70236052 that was online on 
 nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,473 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- 
 assigned=1, rit=false, location=nowwhat.tsunanet.net,54092,1330315542087
 2012-02-26 20:05:48,486 DEBUG 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
 Lookedup root region location,

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-26 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217058#comment-13217058
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 as far as I 
know, it is not possible to obtain a FileSystem object from a FSDataInputStream
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Yes, if we 
bump the major version to V3, then we can restart minorVersions from 0.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, 
 D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, 
 D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-26 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217057#comment-13217057
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 as far as I 
know, it is not possible to obtain a FileSystem object from a FSDataInputStream
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Yes, if we 
bump the major version to V3, then we can restart minorVersions from 0.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, 
 D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, 
 D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-26 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217091#comment-13217091
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I 
think it is better to not add another 4 bytes to the HFileBlock (increases 
heapSize), instead just compute it when needed, especially since this method is 
used only for debugging.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall 
we avoid increasing the HeapSize vs computing headerSize? It should be really 
cheap to compute headerSize(), especially since it is likely to be inlined.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think 
we should always print this. This follows the precedence in other parts of the 
HBase code. And this code path is the exception and not the norm
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am 
pretty sure that it is better to construct this message only if there is a 
checksum mismatch.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The 
secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is 
extracted from the RegionServerServices, if it is not-null. Otherwise, a 
default file system object is created and passed into HRegion.newHRegion
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() 
is better because it allows annotating the name differently from what Java does 
vi toString (especially if we add new crc algorithms in the future)
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would 
like to keep getName() because it allows us to not change the API if  we decide 
to override java's toString convention, especially if we add new checksum 
algorithms in the future. (Similar to why there are two separate methods 
Enum.name and Enum.toString)
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's 
right. But the existence of this API allows us to do own own names in the 
future. (Also, when there are only two or three values, this might be better 
than looking into a map)
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1
 I am not planning to change that, this code is what was there in HFileBlock, 
so it is good to carry it over in a unit test to be able to generate  files in 
the older format. This is used by unit tests alone.

  JUst replacing it with a pre-created file(s) is not very cool, especially 
because the pre-created file(s) will test only that file whereas if we keep 
this code here, we can write more and more unit tests in the future that can 
generate  different files in the older format and test backward compatibility.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, 
 D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, 
 D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-26 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5074:
---

Attachment: D1521.10.patch

dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase 
block cache.
Reviewers: mbautin

  Addressed most of Stack/Ted/Mikails' comments.

  Mikhail: I did not change the interfaces of ChecksumType, just because I think
  what we got is more generic and flexible.

  Stack: I have been running it successfully with load on a 5 node test cluster 
for
  more than 72 hours. Will it be possible for you to take it for a basic sanity 
test?

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/fs
  src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, 
 D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, 
 D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, 
 D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, 
 D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-26 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5074:
---

Attachment: D1521.10.patch

dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase 
block cache.
Reviewers: mbautin

  Addressed most of Stack/Ted/Mikails' comments.

  Mikhail: I did not change the interfaces of ChecksumType, just because I think
  what we got is more generic and flexible.

  Stack: I have been running it successfully with load on a 5 node test cluster 
for
  more than 72 hours. Will it be possible for you to take it for a basic sanity 
test?

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/fs
  src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, 
 D1521.10.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, 
 D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, 
 D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, 
 D1521.9.patch, D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-26 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217094#comment-13217094
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I 
think it is better to not add another 4 bytes to the HFileBlock (increases 
heapSize), instead just compute it when needed, especially since this method is 
used only for debugging.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall 
we avoid increasing the HeapSize vs computing headerSize? It should be really 
cheap to compute headerSize(), especially since it is likely to be inlined.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think 
we should always print this. This follows the precedence in other parts of the 
HBase code. And this code path is the exception and not the norm
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am 
pretty sure that it is better to construct this message only if there is a 
checksum mismatch.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The 
secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is 
extracted from the RegionServerServices, if it is not-null. Otherwise, a 
default file system object is created and passed into HRegion.newHRegion
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() 
is better because it allows annotating the name differently from what Java does 
vi toString (especially if we add new crc algorithms in the future)
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would 
like to keep getName() because it allows us to not change the API if  we decide 
to override java's toString convention, especially if we add new checksum 
algorithms in the future. (Similar to why there are two separate methods 
Enum.name and Enum.toString)
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's 
right. But the existence of this API allows us to do own own names in the 
future. (Also, when there are only two or three values, this might be better 
than looking into a map)
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1
 I am not planning to change that, this code is what was there in HFileBlock, 
so it is good to carry it over in a unit test to be able to generate  files in 
the older format. This is used by unit tests alone.

  JUst replacing it with a pre-created file(s) is not very cool, especially 
because the pre-created file(s) will test only that file whereas if we keep 
this code here, we can write more and more unit tests in the future that can 
generate  different files in the older format and test backward compatibility.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, 
 D1521.10.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, 
 D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, 
 D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, 
 D1521.9.patch, D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

[jira] [Commented] (HBASE-3909) Add dynamic config

[jira] [Commented] (HBASE-5442) Use builder pattern in StoreFile and HFile

[jira] [Created] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

[jira] [Updated] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[jira] [Commented] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places

[jira] [Updated] (HBASE-5462) [monitor] Ganglia metric hbase.master.cluster_requests should exclude the scan meta request generated by master, or create a new metric which could show the real request

[jira] [Commented] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places

[jira] [Commented] (HBASE-5075) regionserver crashed and failover

[jira] [Created] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

[jira] [Updated] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

[jira] [Updated] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

[jira] [Commented] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

[jira] [Commented] (HBASE-5481) Uncaught UnknownHostException prevents HBase from starting

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

27 matches

Site Navigation

Mail list logo

Footer information