Hi Gary,

Thanks for the comprehensive reply. It cleared up my doubts about hadoop
security and hbase+20s setup. I've setup the principals for each server as
you described.

Thanks for the warning, we'd like to stick with the ASF releases of hadoop.
The project I'm working on is still in it's early stages, we can still live
with data loss. Most of our writes are done through bulk import. I believe
worst case we would be losing entries in the -ROOT- and .META. Tables (and
the regions they point to). As a stop gap I was thinking of setting
MEMSTORE_FLUSHSIZE=1 for these tables. Its still not safe but the window
should be a lot smaller.

-Francis      
 

On 6/20/11 11:18 AM, "Gary Helmling" <[email protected]> wrote:

> Hi Francis,
> 
> First a word of warning -- Hadoop 0.20.203 does not include the append
> support that HBase needs to avoid data loss in the case of region server
> failure.  I'd _strongly_ recommend you look at running CDH3 (which contains
> both append support and security) for the moment.  There may be an ASF
> Hadoop 0.20+security+append version release at some point, but there isn't
> one yet.
> 
> Back to the question, you would not want master and region servers to be
> identified as separate users on HDFS.  This would be bound to cause problems
> (or at least complications) with normal operations.
> 
> You _would_ want to have each server identified by a unique kerberos
> principal, however.  The default kerberos principal name form supported by
> secure Hadoop consists of 3 parts:
> 
> username/hostname@REALM
> 
> You can customize how this is parsed out if you have specific needs, but I
> haven't run into that myself.
> 
> Only the "username" portion is used by HDFS during access control checks.
> This is referred to as the "short user name" in the HDFS code.  Including
> hostname in the full kerberos principal prevents the KDC from seeing a
> normal cluster startup as a credential replay attack (and thus rejecting
> valid logins), among other things.
> 
> So a configuration for an example cluster might be:
> 
> Server1:
> - running Master as hbase/[email protected]
> 
> Server2:
> - running Region Server as hbase/[email protected]
> 
> Server3:
> - running Region Server as hbase/[email protected]
> 
> ...
> 
> This way all HBase files in HDFS wind up being owned by the "hbase" user,
> and master can read region server logs, region servers can read version and
> cluster ID files, etc.
> 
> We've been running HBase with this type of configuration on secure Hadoop
> (though our own internal versions are a bit hacked up, to put it mildly),
> with good results for many months.
> 
> Hope this helps.
> 
> Gary
> 
> 
> 
> On Mon, Jun 20, 2011 at 10:55 AM, Francis Christopher Liu <
> [email protected]> wrote:
> 
>> Hi,
>> 
>> I¹m working with Hbase 0.90.3 and hadoop 0.20.203. And I was wondering what
>> the reasons would be to have the master and region server be identified as
>> different users on hdfs? Is it recommended?
>> 
>> Thanks,
>> Francis
>> 

Reply via email to