Re: 答复: hbase 'transparent encryption' feature is production ready or not?

2016-06-06 Thread Andrew Purtell
A HSM offloads storage and use of critically sensitive key material. If instead 
that key material is stored in a local keystore, or on a NFS mounted volume, 
then node level compromise gains access to both key and data for exfiltration. 
HSMs are separate hardware systems hardened against compromise and 
cryptographic operations using keys stored on them are also offloaded to them. 
An attacker with node level access won't be able to gain access to key material 
kept on the secure hardware. There's still compromise but not the keys to the 
kingdom, so to speak. This can be an important part of a defense in depth 
strategy. 


> On Jun 6, 2016, at 6:23 PM, Liu, Ming (Ming)  wrote:
> 
> Thank you both Andrew and Dima,
> 
> It is very good to know the performance penalty is not too much, we will 
> investigate HDFS and HSM, and of course, we will test the perf impact 
> ourselves.
> I think I have misunderstanding of the purpose of encryption, if NFS doesn't 
> provide more protection. The major goal of encryption for me is when the data 
> is physically lost, one cannot read it if he cannot get the key. So unless 
> the NFS and the data disk lost to same person, it is safe. But I should 
> really start to read about HSM.
> 
> Very appreciated of your help.
> Ming
> 
> -邮件原件-
> 发件人: Andrew Purtell [mailto:andrew.purt...@gmail.com] 
> 发送时间: 2016年6月7日 0:37
> 收件人: user@hbase.apache.org
> 抄送: Zhang, Yi (Eason) 
> 主题: Re: hbase 'transparent encryption' feature is production ready or not?
> 
>> if we move the encryption to HDFS level, we no longer can enable 
>> encryption per table I think? I assume encryption will impact 
>> performance to some extent, so we may would like to enable it per 
>> table
> 
> That's correct, at the HDFS level encryption will be over the entire HBase 
> data. I can offer my personal anecdote, which is HDFS encryption adds a 
> modest read penalty and a write penalty that's hard to measure. This can be 
> better than what you'll see using the default codecs provided with HBase 
> encryption because the HDFS implementation can use native code acceleration - 
> assuming you have the Hadoop native libraries properly built and available. 
> 
>> if for example, we can setup a separate storage, like a NFS, which can 
>> be mounted to each node of HBase cluster, and we put the key there, is 
>> it an acceptable plan
> 
> No, this won't provide any additional protection over using local keystore 
> files. 
> 
>> On Jun 6, 2016, at 9:07 AM, Liu, Ming (Ming)  wrote:
>> 
>> Hi, Andrew again,
>> 
>> I still have a question that if we move the encryption to HDFS level, we no 
>> longer can enable encryption per table I think? 
>> I assume encryption will impact performance to some extent, so we may would 
>> like to enable it per table. Is there any performance tests that shows how 
>> much overhead encryption can introduce? If very small, then I am very happy 
>> to do it in HDFS and encrypt all data.
>> I still not start to study HSM, but if for example, we can setup a separate 
>> storage, like a NFS, which can be mounted to each node of HBase cluster, and 
>> we put the key there, is it an acceptable plan?
>> 
>> Thanks,
>> Ming
>> 
>> -邮件原件-
>> 发件人: Andrew Purtell [mailto:apurt...@apache.org]
>> 发送时间: 2016年6月3日 12:27
>> 收件人: user@hbase.apache.org
>> 抄送: Zhang, Yi (Eason) 
>> 主题: Re: 答复: hbase 'transparent encryption' feature is production ready or 
>> not?
>> 
>>> We are now confident to use this feature.
>> 
>> You should test carefully for your use case in any case.
>> 
>>> HSM is a good option, I am new to it. But will look at it.
>> 
>> I recommend using HDFS's transparent encryption feature instead of 
>> HBase transparent encryption if you're only just now thinking about 
>> HSMs and key protection in general. Storing the master key on the same 
>> nodes as the encrypted data will defeat protection. This should be 
>> offloaded to a protected domain. Hadoop ships with a software KMS 
>> that, while it has limitations, can be set up on a specially secured 
>> server and HDFS TDE can take advantage of it. (HBase TDE doesn't 
>> support the Hadoop KMS.)
>> 
>> Advice offered for what it's worth (smile)
>> 
>> 
>>> On Thu, Jun 2, 2016 at 9:16 PM, Liu, Ming (Ming)  wrote:
>>> 
>>> Thank you Andrew!
>>> 
>>> What we hear must be rumor :-) We are now confident to use this feature.
>>> 
>>> HSM is a good option, I am new to it. But will look at it.
>>> 
>>> Thanks,
>>> Ming
>>> -邮件原件-
>>> 发件人: Andrew Purtell [mailto:apurt...@apache.org]
>>> 发送时间: 2016年6月3日 8:59
>>> 收件人: user@hbase.apache.org
>>> 抄送: Zhang, Yi (Eason) 
>>> 主题: Re: hbase 'transparent encryption' feature is production ready or not?
>>> 
 We heard from various sources that it is not production ready before.
>>> 
>>> ​Said by whom, specifically? ​
>>> 
>>> ​> During our tests, we do find out 

答复: hbase 'transparent encryption' feature is production ready or not?

2016-06-06 Thread Liu, Ming (Ming)
Thank you both Andrew and Dima,

It is very good to know the performance penalty is not too much, we will 
investigate HDFS and HSM, and of course, we will test the perf impact ourselves.
I think I have misunderstanding of the purpose of encryption, if NFS doesn't 
provide more protection. The major goal of encryption for me is when the data 
is physically lost, one cannot read it if he cannot get the key. So unless the 
NFS and the data disk lost to same person, it is safe. But I should really 
start to read about HSM.

Very appreciated of your help.
Ming

-邮件原件-
发件人: Andrew Purtell [mailto:andrew.purt...@gmail.com] 
发送时间: 2016年6月7日 0:37
收件人: user@hbase.apache.org
抄送: Zhang, Yi (Eason) 
主题: Re: hbase 'transparent encryption' feature is production ready or not?

> if we move the encryption to HDFS level, we no longer can enable 
> encryption per table I think? I assume encryption will impact 
> performance to some extent, so we may would like to enable it per 
> table

That's correct, at the HDFS level encryption will be over the entire HBase 
data. I can offer my personal anecdote, which is HDFS encryption adds a modest 
read penalty and a write penalty that's hard to measure. This can be better 
than what you'll see using the default codecs provided with HBase encryption 
because the HDFS implementation can use native code acceleration - assuming you 
have the Hadoop native libraries properly built and available. 

> if for example, we can setup a separate storage, like a NFS, which can 
> be mounted to each node of HBase cluster, and we put the key there, is 
> it an acceptable plan

No, this won't provide any additional protection over using local keystore 
files. 

> On Jun 6, 2016, at 9:07 AM, Liu, Ming (Ming)  wrote:
> 
> Hi, Andrew again,
> 
> I still have a question that if we move the encryption to HDFS level, we no 
> longer can enable encryption per table I think? 
> I assume encryption will impact performance to some extent, so we may would 
> like to enable it per table. Is there any performance tests that shows how 
> much overhead encryption can introduce? If very small, then I am very happy 
> to do it in HDFS and encrypt all data.
> I still not start to study HSM, but if for example, we can setup a separate 
> storage, like a NFS, which can be mounted to each node of HBase cluster, and 
> we put the key there, is it an acceptable plan?
> 
> Thanks,
> Ming
> 
> -邮件原件-
> 发件人: Andrew Purtell [mailto:apurt...@apache.org]
> 发送时间: 2016年6月3日 12:27
> 收件人: user@hbase.apache.org
> 抄送: Zhang, Yi (Eason) 
> 主题: Re: 答复: hbase 'transparent encryption' feature is production ready or not?
> 
>> We are now confident to use this feature.
> 
> You should test carefully for your use case in any case.
> 
>> HSM is a good option, I am new to it. But will look at it.
> 
> I recommend using HDFS's transparent encryption feature instead of 
> HBase transparent encryption if you're only just now thinking about 
> HSMs and key protection in general. Storing the master key on the same 
> nodes as the encrypted data will defeat protection. This should be 
> offloaded to a protected domain. Hadoop ships with a software KMS 
> that, while it has limitations, can be set up on a specially secured 
> server and HDFS TDE can take advantage of it. (HBase TDE doesn't 
> support the Hadoop KMS.)
> 
> Advice offered for what it's worth (smile)
> 
> 
>> On Thu, Jun 2, 2016 at 9:16 PM, Liu, Ming (Ming)  wrote:
>> 
>> Thank you Andrew!
>> 
>> What we hear must be rumor :-) We are now confident to use this feature.
>> 
>> HSM is a good option, I am new to it. But will look at it.
>> 
>> Thanks,
>> Ming
>> -邮件原件-
>> 发件人: Andrew Purtell [mailto:apurt...@apache.org]
>> 发送时间: 2016年6月3日 8:59
>> 收件人: user@hbase.apache.org
>> 抄送: Zhang, Yi (Eason) 
>> 主题: Re: hbase 'transparent encryption' feature is production ready or not?
>> 
>>> We heard from various sources that it is not production ready before.
>> 
>> ​Said by whom, specifically? ​
>> 
>> ​> During our tests, we do find out it works not very stable, but 
>> probably due to our lack of experience of this feature.
>> 
>> If you have something repeatable, please consider filing a JIRA to 
>> report the problem.
>> 
>>> And, we now save the encryption key in the disk, so we were 
>>> wondering,
>> this is something not secure.
>> 
>> Data keys are encrypted with a master key which must be protected. 
>> The out of the box key provider stores the master key in a local keystore.
>> That's not sufficient protection. In a production environment you 
>> will want to use a HSM. Most (all?) HSMs support the keystore API. If 
>> that is not sufficient, our KeyProvider API is extensible for the 
>> solution you choose to employ in production.
>> 
>> ​Have you looked at HDFS transparent encryption?
>> 
>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs
>> 

Re: Rename tables or swap alias

2016-06-06 Thread Pat Ferrel
We implemented this by upserting changed elements and dropping others. On a 
given cluster is takes 4.5 hours to load HBase, the trim and cleanup as 
currently implemented takes 4 days. Back to the drawing board.

I’ve read the references but still don’t grok what to do. I have a table with 
an event stream, containing duplicates and expired data. I’d like to find the 
most time-efficient way to remove duplicates and drop expired data from what 
I’ll call the main_table. This is being queried and added to all the time.

My first thought was to create a new clean_table with Spark by reading 
main_table, processing and writing clean_table then renaming main_table to 
old_table, and renaming clean_table to main_table. I can now drop old_table. 
Ignoring what happens to events during renaming, this would be efficient 
because it would be equivalent to loading, no complex updates to tables in 
place and under load. 

Snapshots and clones seem to miss the issue which is writing the cleaned data 
to some place that can now act like main_table but clearly I don’t understand 
snapshots and clones. They seem to be some way to alias a table so only changes 
are logged, without actually copying the data. I’m not sure i care about 
copying the data into an RDD, which will then undergo some transforms into a 
final RDD. This can be written efficiently into clean_table with no upserts or 
droping of elements, which seems to be cause things to slow to a halt.

So assuming I have clean_table, how do I get all queries to go to it, instead 
of main_table? Elasticsearch has an alias that I can just point somewhere new. 
Do I need to keep track of something like this outside of HBase and change it 
after creating clean_table or am I missing how to do this with shapshots and 
clones?



From: Ted Yu >
Subject: Re: Rename tables or swap alias
Date: February 16, 2016 at 6:48:53 AM PST
To: "user@hbase.apache.org " 
>
Reply-To: user@hbase.apache.org 

Please see http://hbase.apache.org/book.html#ops.snapshots 
 for background
on snapshots.

In Anil's description, table_old is the result of cloning the snapshot
which is taken in step #1. See
http://hbase.apache.org/book.html#ops.snapshots.clone 


Cheers

On Tue, Feb 16, 2016 at 6:35 AM, Pat Ferrel  wrote:

> I think I can work out the algorithm if I knew precisely what a “snapshot"
> does. From my reading it seems to be a lightweight fast alias (for lack of
> a better word) since it creates something that refers to the same physical
> data.So if I create a new table with cleaned data, call it table_new. Then
> I drop table_old and “snapshot” table_new into table_old? Is this what is
> suggested?
> 
> This leaves me with a small time where there is no table_old, which is the
> time between dropping table_old and creating a snapshot. Is it feasible to
> lock the DB for this time?
> 
>> On Feb 15, 2016, at 7:13 PM, Ted Yu  wrote:
>> 
>> Keep in mind that if the writes to this table are not paused, there would
>> be some data coming in between steps #1 and #2 which would not be in the
>> snapshot.
>> 
>> Cheers
>> 
>> On Mon, Feb 15, 2016 at 6:21 PM, Anil Gupta 
> wrote:
>> 
>>> I dont think there is any atomic operations in hbase to support ddl
> across
>>> 2 tables.
>>> 
>>> But, maybe you can use hbase snapshots.
>>> 1.Create a hbase snapshot.
>>> 2.Truncate the table.
>>> 3.Write data to the table.
>>> 4.Create a table from snapshot taken in step #1 as table_old.
>>> 
>>> Now you have two tables. One with current run data and other with last
> run
>>> data.
>>> I think above process will suffice. But, keep in mind that it is not
>>> atomic.
>>> 
>>> HTH,
>>> Anil
>>> Sent from my iPhone
>>> 
 On Feb 15, 2016, at 4:25 PM, Pat Ferrel  wrote:
 
 Any other way to do what I was asking. With Spark this is a very normal
>>> thing to treat a table as immutable and create another to replace the
> old.
 
 Can you lock two tables and rename them in 2 actions then unlock in a
>>> very short period of time?
 
 Or an alias for table names?
 
 Didn’t see these in any docs or Googling, any help is appreciated.
>>> Writing all this data back to the original table would be a huge load
> on a
>>> table being written to by external processes and therefore under large
> load
>>> to begin with.
 
> On Feb 14, 2016, at 5:03 PM, Ted Yu  wrote:
> 
> There is currently no native support for renaming two tables in one
>>> atomic
> action.
> 
> FYI
> 
>> On Sun, Feb 14, 2016 at 4:18 PM, Pat Ferrel 
>>> wrote:
>> 
>> I use Spark to 

Re: Enabling stripe compaction without disabling table

2016-06-06 Thread Bryan Beaudreault
Thanks Ted, I have seen that and I have had it set for to true for years
without issue. I was asking in this case because the docs for stripe
compaction explicitly say to disable the table. I will test in our QA
environment first, but would also appreciate input from anyone who has done
this without disabling the table first, for better or worse.

On Mon, Jun 6, 2016 at 3:26 PM Ted Yu  wrote:

> Have you seen the doc at the top
> of ./hbase-shell/src/main/ruby/shell/commands/alter.rb ?
>
> Alter a table. If the "hbase.online.schema.update.enable" property is set
> to
> false, then the table must be disabled (see help 'disable'). If the
> "hbase.online.schema.update.enable" property is set to true, tables can be
> altered without disabling them first. Altering enabled tables has caused
> problems
> in the past, so use caution and test it before using in production.
>
> FYI
>
> On Mon, Jun 6, 2016 at 12:19 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > Hello,
> >
> > We're running hbase 1.2.0-cdh5.7.0. According to the HBase book, in order
> > to enable stripe compactions on a table we need to first disable the
> > table. We
> > basically can't disable tables in production. Is it possible to do this
> > without disabling the table?  If not, are there any plans to make this
> > doable?
> >
> > Thanks!
> >
>


Re: Enabling stripe compaction without disabling table

2016-06-06 Thread Ted Yu
Have you seen the doc at the top
of ./hbase-shell/src/main/ruby/shell/commands/alter.rb ?

Alter a table. If the "hbase.online.schema.update.enable" property is set to
false, then the table must be disabled (see help 'disable'). If the
"hbase.online.schema.update.enable" property is set to true, tables can be
altered without disabling them first. Altering enabled tables has caused
problems
in the past, so use caution and test it before using in production.

FYI

On Mon, Jun 6, 2016 at 12:19 PM, Bryan Beaudreault  wrote:

> Hello,
>
> We're running hbase 1.2.0-cdh5.7.0. According to the HBase book, in order
> to enable stripe compactions on a table we need to first disable the
> table. We
> basically can't disable tables in production. Is it possible to do this
> without disabling the table?  If not, are there any plans to make this
> doable?
>
> Thanks!
>


Enabling stripe compaction without disabling table

2016-06-06 Thread Bryan Beaudreault
Hello,

We're running hbase 1.2.0-cdh5.7.0. According to the HBase book, in order
to enable stripe compactions on a table we need to first disable the table. We
basically can't disable tables in production. Is it possible to do this
without disabling the table?  If not, are there any plans to make this
doable?

Thanks!


Re: hbase 'transparent encryption' feature is production ready or not?

2016-06-06 Thread Andrew Purtell
> if we move the encryption to HDFS level, we no longer can enable encryption 
> per table I think? I assume encryption will impact performance to some 
> extent, so we may would like to enable it per table

That's correct, at the HDFS level encryption will be over the entire HBase 
data. I can offer my personal anecdote, which is HDFS encryption adds a modest 
read penalty and a write penalty that's hard to measure. This can be better 
than what you'll see using the default codecs provided with HBase encryption 
because the HDFS implementation can use native code acceleration - assuming you 
have the Hadoop native libraries properly built and available. 

> if for example, we can setup a separate storage, like a NFS, which can be 
> mounted to each node of HBase cluster, and we put the key there, is it an 
> acceptable plan

No, this won't provide any additional protection over using local keystore 
files. 

> On Jun 6, 2016, at 9:07 AM, Liu, Ming (Ming)  wrote:
> 
> Hi, Andrew again,
> 
> I still have a question that if we move the encryption to HDFS level, we no 
> longer can enable encryption per table I think? 
> I assume encryption will impact performance to some extent, so we may would 
> like to enable it per table. Is there any performance tests that shows how 
> much overhead encryption can introduce? If very small, then I am very happy 
> to do it in HDFS and encrypt all data.
> I still not start to study HSM, but if for example, we can setup a separate 
> storage, like a NFS, which can be mounted to each node of HBase cluster, and 
> we put the key there, is it an acceptable plan?
> 
> Thanks,
> Ming
> 
> -邮件原件-
> 发件人: Andrew Purtell [mailto:apurt...@apache.org] 
> 发送时间: 2016年6月3日 12:27
> 收件人: user@hbase.apache.org
> 抄送: Zhang, Yi (Eason) 
> 主题: Re: 答复: hbase 'transparent encryption' feature is production ready or not?
> 
>> We are now confident to use this feature.
> 
> You should test carefully for your use case in any case.
> 
>> HSM is a good option, I am new to it. But will look at it.
> 
> I recommend using HDFS's transparent encryption feature instead of HBase 
> transparent encryption if you're only just now thinking about HSMs and key 
> protection in general. Storing the master key on the same nodes as the 
> encrypted data will defeat protection. This should be offloaded to a 
> protected domain. Hadoop ships with a software KMS that, while it has 
> limitations, can be set up on a specially secured server and HDFS TDE can 
> take advantage of it. (HBase TDE doesn't support the Hadoop KMS.)
> 
> Advice offered for what it's worth (smile)
> 
> 
>> On Thu, Jun 2, 2016 at 9:16 PM, Liu, Ming (Ming)  wrote:
>> 
>> Thank you Andrew!
>> 
>> What we hear must be rumor :-) We are now confident to use this feature.
>> 
>> HSM is a good option, I am new to it. But will look at it.
>> 
>> Thanks,
>> Ming
>> -邮件原件-
>> 发件人: Andrew Purtell [mailto:apurt...@apache.org]
>> 发送时间: 2016年6月3日 8:59
>> 收件人: user@hbase.apache.org
>> 抄送: Zhang, Yi (Eason) 
>> 主题: Re: hbase 'transparent encryption' feature is production ready or not?
>> 
>>> We heard from various sources that it is not production ready before.
>> 
>> ​Said by whom, specifically? ​
>> 
>> ​> During our tests, we do find out it works not very stable, but 
>> probably due to our lack of experience of this feature.
>> 
>> If you have something repeatable, please consider filing a JIRA to 
>> report the problem.
>> 
>>> And, we now save the encryption key in the disk, so we were 
>>> wondering,
>> this is something not secure.
>> 
>> Data keys are encrypted with a master key which must be protected. The 
>> out of the box key provider stores the master key in a local keystore. 
>> That's not sufficient protection. In a production environment you will 
>> want to use a HSM. Most (all?) HSMs support the keystore API. If that 
>> is not sufficient, our KeyProvider API is extensible for the solution 
>> you choose to employ in production.
>> 
>> ​Have you looked at HDFS transparent encryption?
>> 
>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/
>> TransparentEncryption.html Because it works at the HDFS layer it's a 
>> more general solution. Be careful what version of Hadoop you use if 
>> opting for HDFS TDE, though. Pick the most recent release. Slightly 
>> older versions (like 2.6.0) had fatal bugs if used in conjunction with 
>> HBase.
>> 
>> 
>> 
>> On Thu, Jun 2, 2016 at 5:52 PM, Liu, Ming (Ming) 
>> wrote:
>> 
>>> Hi, all,
>>> 
>>> We are trying to deploy the 'transparent encryption' feature of 
>>> HBase , described in HBase reference guide:
>>> https://hbase.apache.org/book.html#hbase.encryption.server  , in our 
>>> product.
>>> We heard from various sources that it is not production ready before.
>>> 
>>> During our tests, we do find out it works not very stable, but 
>>> probably due to our lack of 

Re: hbase 'transparent encryption' feature is production ready or not?

2016-06-06 Thread Dima Spivak
FWIW, some engineers at Cloudera who worked on adding encryption at rest to
HDFS wrote a blog post on this where they describe negligible performance
impacts on write and only a slight performance degradation on large reads (
http://blog.cloudera.com/blog/2015/01/new-in-cdh-5-3-transparent-encryption-in-hdfs/).
Obviously, your mileage may vary, but in my internal testing, I can also
say I haven't seen much (if any) impact with encryption zones enabled.

-Dima

On Monday, June 6, 2016, Liu, Ming (Ming)  wrote:

> Hi, Andrew again,
>
> I still have a question that if we move the encryption to HDFS level, we
> no longer can enable encryption per table I think?
> I assume encryption will impact performance to some extent, so we may
> would like to enable it per table. Is there any performance tests that
> shows how much overhead encryption can introduce? If very small, then I am
> very happy to do it in HDFS and encrypt all data.
> I still not start to study HSM, but if for example, we can setup a
> separate storage, like a NFS, which can be mounted to each node of HBase
> cluster, and we put the key there, is it an acceptable plan?
>
> Thanks,
> Ming
>
> -邮件原件-
> 发件人: Andrew Purtell [mailto:apurt...@apache.org ]
> 发送时间: 2016年6月3日 12:27
> 收件人: user@hbase.apache.org 
> 抄送: Zhang, Yi (Eason) >
> 主题: Re: 答复: hbase 'transparent encryption' feature is production ready or
> not?
>
> > We are now confident to use this feature.
>
> You should test carefully for your use case in any case.
>
> > HSM is a good option, I am new to it. But will look at it.
>
> I recommend using HDFS's transparent encryption feature instead of HBase
> transparent encryption if you're only just now thinking about HSMs and key
> protection in general. Storing the master key on the same nodes as the
> encrypted data will defeat protection. This should be offloaded to a
> protected domain. Hadoop ships with a software KMS that, while it has
> limitations, can be set up on a specially secured server and HDFS TDE can
> take advantage of it. (HBase TDE doesn't support the Hadoop KMS.)
>
> Advice offered for what it's worth (smile)
>
>
> On Thu, Jun 2, 2016 at 9:16 PM, Liu, Ming (Ming)  > wrote:
>
> > Thank you Andrew!
> >
> > What we hear must be rumor :-) We are now confident to use this feature.
> >
> > HSM is a good option, I am new to it. But will look at it.
> >
> > Thanks,
> > Ming
> > -邮件原件-
> > 发件人: Andrew Purtell [mailto:apurt...@apache.org ]
> > 发送时间: 2016年6月3日 8:59
> > 收件人: user@hbase.apache.org 
> > 抄送: Zhang, Yi (Eason) >
> > 主题: Re: hbase 'transparent encryption' feature is production ready or
> not?
> >
> > > We heard from various sources that it is not production ready before.
> >
> > ​Said by whom, specifically? ​
> >
> > ​> During our tests, we do find out it works not very stable, but
> > probably due to our lack of experience of this feature.
> >
> > If you have something repeatable, please consider filing a JIRA to
> > report the problem.
> >
> > > And, we now save the encryption key in the disk, so we were
> > > wondering,
> > this is something not secure.
> >
> > Data keys are encrypted with a master key which must be protected. The
> > out of the box key provider stores the master key in a local keystore.
> > That's not sufficient protection. In a production environment you will
> > want to use a HSM. Most (all?) HSMs support the keystore API. If that
> > is not sufficient, our KeyProvider API is extensible for the solution
> > you choose to employ in production.
> >
> > ​Have you looked at HDFS transparent encryption?
> >
> > https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/
> > TransparentEncryption.html Because it works at the HDFS layer it's a
> > more general solution. Be careful what version of Hadoop you use if
> > opting for HDFS TDE, though. Pick the most recent release. Slightly
> > older versions (like 2.6.0) had fatal bugs if used in conjunction with
> > HBase.
> >
> >
> >
> > On Thu, Jun 2, 2016 at 5:52 PM, Liu, Ming (Ming)  >
> > wrote:
> >
> > > Hi, all,
> > >
> > > We are trying to deploy the 'transparent encryption' feature of
> > > HBase , described in HBase reference guide:
> > > https://hbase.apache.org/book.html#hbase.encryption.server  , in our
> > > product.
> > > We heard from various sources that it is not production ready before.
> > >
> > > During our tests, we do find out it works not very stable, but
> > > probably due to our lack of experience of this feature. It works
> > > sometime, sometimes not work, and retry the same configuration, it
> > > work again. We were using HBase 1.0.
> > >
> > > Could anyone give us some information that this feature is already
> > > stable and can be used in a production environment?
> > >
> > > And, we now save the 

Re: hbase 'transparent encryption' feature is production ready or not?

2016-06-06 Thread Liu, Ming (Ming)
Hi, Andrew again,

I still have a question that if we move the encryption to HDFS level, we no 
longer can enable encryption per table I think? 
I assume encryption will impact performance to some extent, so we may would 
like to enable it per table. Is there any performance tests that shows how much 
overhead encryption can introduce? If very small, then I am very happy to do it 
in HDFS and encrypt all data.
I still not start to study HSM, but if for example, we can setup a separate 
storage, like a NFS, which can be mounted to each node of HBase cluster, and we 
put the key there, is it an acceptable plan?

Thanks,
Ming

-邮件原件-
发件人: Andrew Purtell [mailto:apurt...@apache.org] 
发送时间: 2016年6月3日 12:27
收件人: user@hbase.apache.org
抄送: Zhang, Yi (Eason) 
主题: Re: 答复: hbase 'transparent encryption' feature is production ready or not?

> We are now confident to use this feature.

You should test carefully for your use case in any case.

> HSM is a good option, I am new to it. But will look at it.

I recommend using HDFS's transparent encryption feature instead of HBase 
transparent encryption if you're only just now thinking about HSMs and key 
protection in general. Storing the master key on the same nodes as the 
encrypted data will defeat protection. This should be offloaded to a protected 
domain. Hadoop ships with a software KMS that, while it has limitations, can be 
set up on a specially secured server and HDFS TDE can take advantage of it. 
(HBase TDE doesn't support the Hadoop KMS.)

Advice offered for what it's worth (smile)


On Thu, Jun 2, 2016 at 9:16 PM, Liu, Ming (Ming)  wrote:

> Thank you Andrew!
>
> What we hear must be rumor :-) We are now confident to use this feature.
>
> HSM is a good option, I am new to it. But will look at it.
>
> Thanks,
> Ming
> -邮件原件-
> 发件人: Andrew Purtell [mailto:apurt...@apache.org]
> 发送时间: 2016年6月3日 8:59
> 收件人: user@hbase.apache.org
> 抄送: Zhang, Yi (Eason) 
> 主题: Re: hbase 'transparent encryption' feature is production ready or not?
>
> > We heard from various sources that it is not production ready before.
>
> ​Said by whom, specifically? ​
>
> ​> During our tests, we do find out it works not very stable, but 
> probably due to our lack of experience of this feature.
>
> If you have something repeatable, please consider filing a JIRA to 
> report the problem.
>
> > And, we now save the encryption key in the disk, so we were 
> > wondering,
> this is something not secure.
>
> Data keys are encrypted with a master key which must be protected. The 
> out of the box key provider stores the master key in a local keystore. 
> That's not sufficient protection. In a production environment you will 
> want to use a HSM. Most (all?) HSMs support the keystore API. If that 
> is not sufficient, our KeyProvider API is extensible for the solution 
> you choose to employ in production.
>
> ​Have you looked at HDFS transparent encryption?
>
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/
> TransparentEncryption.html Because it works at the HDFS layer it's a 
> more general solution. Be careful what version of Hadoop you use if 
> opting for HDFS TDE, though. Pick the most recent release. Slightly 
> older versions (like 2.6.0) had fatal bugs if used in conjunction with 
> HBase.
>
>
>
> On Thu, Jun 2, 2016 at 5:52 PM, Liu, Ming (Ming) 
> wrote:
>
> > Hi, all,
> >
> > We are trying to deploy the 'transparent encryption' feature of 
> > HBase , described in HBase reference guide:
> > https://hbase.apache.org/book.html#hbase.encryption.server  , in our 
> > product.
> > We heard from various sources that it is not production ready before.
> >
> > During our tests, we do find out it works not very stable, but 
> > probably due to our lack of experience of this feature. It works 
> > sometime, sometimes not work, and retry the same configuration, it 
> > work again. We were using HBase 1.0.
> >
> > Could anyone give us some information that this feature is already 
> > stable and can be used in a production environment?
> >
> > And, we now save the encryption key in the disk, so we were 
> > wondering, this is something not secure. Since the key is at same 
> > place with data, anyone can decode the data because if he/she can 
> > access the data, he/she can access the key as well. Is there any 
> > best practice about how to manage the key?
> >
> > Thanks,
> > Ming
> >
> >
>
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet 
> Hein (via Tom White)
>



--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


Re: Some regions never get online after a region server crashes

2016-06-06 Thread Shuai Lin
Would do that, thanks!

On Sat, Jun 4, 2016 at 6:55 PM, Ted Yu  wrote:

> I think this sounds like a bug.
>
> Search in HBase JIRA first. If there is no JIRA with the same symptom,
> consider filing one.
>
> Cheers
>
> On Fri, Jun 3, 2016 at 1:10 AM, Shuai Lin  wrote:
>
>> Hi Ted,
>>
>> I'm kind of confused, so is this normal behaviour or a bug of hbase? For
>> me it looks like a bug, should I fire a JIRA?
>>
>> Thanks,
>>
>> Shuai
>>
>> On Fri, May 27, 2016 at 8:02 PM, Ted Yu  wrote:
>>
>>> There were 7 regions Master tried to close which were opening but not
>>> yet served.
>>>
>>> d1c7f3f455f2529da82a2f713b5ee067 was one of them.
>>>
>>> On Fri, May 27, 2016 at 12:47 AM, Shuai Lin 
>>> wrote:
>>>
 Here is the complete log on node6 between 13:10:47 and 13:11:47:
 http://paste.openstack.org/raw/505826/

 The master asked node6 to open several regions. Node6 opened the first
 4 very fast (within 1 seconsd) and got stuck at the 5th one. But there is
 no errors at that time.

 On Wed, May 25, 2016 at 10:12 PM, Ted Yu  wrote:

> In AssignmentManager#assign(), you should find:
>
>   // Send OPEN RPC. If it fails on a IOE or RemoteException,
>   // regions will be assigned individually.
>   long maxWaitTime = System.currentTimeMillis() +
> this.server.getConfiguration().
>   getLong("hbase.regionserver.rpc.startup.waittime",
> 6);
>
> BTW can you see what caused rs-node6 to not respond around 13:11:47 ?
>
> Cheers
>
> On Fri, May 20, 2016 at 6:20 AM, Shuai Lin 
> wrote:
>
>> Because of the "opening regions" rpc call sent by master to the
>> region server node6 got timed out after 1 minutes?
>>
>> *RPC call was sent:*
>>
>> 2016-04-30 13:10:47,702 INFO 
>> org.apache.hadoop.hbase.master.AssignmentManager:
>> Assigning 22 region(s) tors-node6.example.com,60020,1458723856883
>>
>> *After 1 minute:*
>>
>> 2016-04-30 13:11:47,780 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: Unable to communicate
>> with rs-node6.example.com,60020,1458723856883 in order to assign
>> regions, java.io.IOException: Call to
>> rs-node6.example.com/172.16.6.6:60020 failed on local exception:
>> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=4,
>> waitTime=60001, operationTimeout=6 expired.
>>
>> 2016-04-30 13:11:47,783 DEBUG
>> org.apache.hadoop.hbase.master.AssignmentManager: Force region state
>> offline {d1c7f3f455f2529da82a2f713b5ee067 state=PENDING_OPEN,
>> ts=1462021847743, server=rs-node6.example.com,60020,1458723856883}
>>
>>
>> I have checked hbase source code, but don't find any specific timeout
>> settings for "open region" rpc call I can use. So I guess the it's using
>> the default "hbase.rpc.timeout", which defaults to 60secs. And since 
>> there
>> are 20+ regions being assigned to node6 almost at the same moment, node6
>> gets overloaded and can't finish opening all of them within one minute.
>>
>> So this looks like a hbase bug to me (regions never get online when
>> the region server failed to handle the OpenRegionRequest before the rpc
>> timeout), am I right?
>>
>>
>> On Fri, May 20, 2016 at 12:42 PM, Ted Yu  wrote:
>>
>>> Looks like region d1c7f3f455f2529da82a2f713b5ee067 received CLOSE
>>> request
>>> when it was opening, leading to RegionAlreadyInTransitionException.
>>>
>>> Was there any clue in master log why the close request was sent ?
>>>
>>> Cheers
>>>
>>> On Wed, May 4, 2016 at 8:02 PM, Shuai Lin 
>>> wrote:
>>>
>>> > Hi Ted,
>>> >
>>> > The hbase version is 1.0.0-cdh5.4.8, shipped with cloudera CDH
>>> 5.4.8. The
>>> > RS logs on node6 can be found here <
>>> http://paste.openstack.org/raw/496174/
>>> > >
>>> >  .
>>> >
>>> > Thanks!
>>> >
>>> > Shuai
>>> >
>>> > On Thu, May 5, 2016 at 9:15 AM, Ted Yu 
>>> wrote:
>>> >
>>> > > Can you pastebin related server log w.r.t.
>>> > d1c7f3f455f2529da82a2f713b5ee067
>>> > > from rs-node6 ?
>>> > >
>>> > > Which release of hbase are you using ?
>>> > >
>>> > > Cheers
>>> > >
>>> > > On Wed, May 4, 2016 at 6:07 PM, Shuai Lin <
>>> linshuai2...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Hi list,
>>> > > >
>>> > > > Last weekend I got a region server crashed, but some regions
>>> never got
>>> > > > online again on other RSes. I've gone through the logs, and
>>> here is the
>>> > > > timeline about some of the events:
>>> > > >
>>> > > > * 13:03:50 on of 

RE: IndexOutOfBoundsException during retrieving region split point

2016-06-06 Thread Pankaj kr
Thanks Anoop for replying...
Yes, in our test environment also numDataIndexLevels=2.

In production environment there were successful region split after compaction, 
only few region split failed with same error.

Regards,
Pankaj

-Original Message-
From: Anoop John [mailto:anoop.hb...@gmail.com] 
Sent: Monday, June 06, 2016 3:49 PM
To: user@hbase.apache.org
Subject: Re: IndexOutOfBoundsException during retrieving region split point

In ur test env also u have numDataIndexLevels=2?  Or it is 1 only?

-Anoop-

On Mon, Jun 6, 2016 at 1:12 PM, Pankaj kr  wrote:
> Thanks Ted for replying.
> Yeah, We have a plan to upgrade. But currently I want to know the reason 
> behind this. I tried to reproduce this in our test environment but didn’t 
> happen.
>
> in HFilePrettyPrinter output "numDataIndexLevels=2", so there were multilevel 
> data index. Is which circumstances this problem can happen?
>
> Regards,
> Pankaj
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Saturday, June 04, 2016 12:16 AM
> To: user@hbase.apache.org
> Cc: bhupendra jain; Sharanabasappa G Keriwaddi
> Subject: Re: IndexOutOfBoundsException during retrieving region split 
> point
>
> 1.0.0 is quite old.
>
> Is it possible to upgrade to 1.1 or 1.2 release ?
>
> Thanks
>
> On Fri, Jun 3, 2016 at 8:12 AM, Pankaj kr  wrote:
>
>> Hi,
>>
>> We met a weird scenario in our production environment.
>> IndexOutOfBoundsException is thrown while retrieving mid key of the 
>> storefile after region compaction.
>>
>> Log Snippet :
>> -
>> 2016-05-30 01:41:58,484 | INFO  |
>> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 | 
>> Completed compaction of 1 (all) file(s) in CF of 
>> User_Namespace:User_Table,100050007010803_20140126_308010717550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3c5.
>> into eee1f433635d478197b212e2e378fce8(size=22.0 G), total size for 
>> store is
>> 22.0 G. This selection was in queue for 0sec, and took 6mins, 25sec 
>> to execute. | 
>> org.apache.hadoop.hbase.regionserver.HStore.logCompactionEndMessage(H
>> S
>> tore.java:1356)
>> 2016-05-30 01:41:58,485 | INFO  |
>> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 | 
>> Completed compaction: Request =
>> regionName=User_Namespace:User_Table,100050007010803_20140126_3080107
>> 1
>> 7550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d
>> 3 c5., storeName=CF, fileCount=1, fileSize=44.0 G, priority=6, 
>> time=295643974900644; duration=6mins, 25sec | 
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRun
>> n
>> er.run(CompactSplitThread.java:544)
>> 2016-05-30 01:41:58,529 | ERROR |
>> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 | 
>> Compaction failed Request =
>> regionName=User_Namespace:User_Table,100050007010803_20140126_3080107
>> 1
>> 7550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d
>> 3 c5., storeName=CF, fileCount=1, fileSize=44.0 G, priority=6,
>> time=295643974900644 |
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRun
>> n
>> er.run(CompactSplitThread.java:563)
>> java.lang.IndexOutOfBoundsException
>> at java.nio.Buffer.checkIndex(Buffer.java:540)
>> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
>> at
>> org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:490)
>> at
>> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:349)
>> at
>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:512)
>> at
>> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1480)
>> at
>> org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:685)
>> at
>> org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126)
>> at
>> org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1986)
>> at
>> org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7914)
>> at
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestSplit(CompactSplitThread.java:240)
>> at
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:552)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> -
>> Observation:
>> >> HFilePrettyPrinter also print the 

Re: IndexOutOfBoundsException during retrieving region split point

2016-06-06 Thread Anoop John
In ur test env also u have numDataIndexLevels=2?  Or it is 1 only?

-Anoop-

On Mon, Jun 6, 2016 at 1:12 PM, Pankaj kr  wrote:
> Thanks Ted for replying.
> Yeah, We have a plan to upgrade. But currently I want to know the reason 
> behind this. I tried to reproduce this in our test environment but didn’t 
> happen.
>
> in HFilePrettyPrinter output "numDataIndexLevels=2", so there were multilevel 
> data index. Is which circumstances this problem can happen?
>
> Regards,
> Pankaj
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Saturday, June 04, 2016 12:16 AM
> To: user@hbase.apache.org
> Cc: bhupendra jain; Sharanabasappa G Keriwaddi
> Subject: Re: IndexOutOfBoundsException during retrieving region split point
>
> 1.0.0 is quite old.
>
> Is it possible to upgrade to 1.1 or 1.2 release ?
>
> Thanks
>
> On Fri, Jun 3, 2016 at 8:12 AM, Pankaj kr  wrote:
>
>> Hi,
>>
>> We met a weird scenario in our production environment.
>> IndexOutOfBoundsException is thrown while retrieving mid key of the
>> storefile after region compaction.
>>
>> Log Snippet :
>> -
>> 2016-05-30 01:41:58,484 | INFO  |
>> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 |
>> Completed compaction of 1 (all) file(s) in CF of
>> User_Namespace:User_Table,100050007010803_20140126_308010717550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3c5.
>> into eee1f433635d478197b212e2e378fce8(size=22.0 G), total size for
>> store is
>> 22.0 G. This selection was in queue for 0sec, and took 6mins, 25sec to
>> execute. |
>> org.apache.hadoop.hbase.regionserver.HStore.logCompactionEndMessage(HS
>> tore.java:1356)
>> 2016-05-30 01:41:58,485 | INFO  |
>> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 |
>> Completed compaction: Request =
>> regionName=User_Namespace:User_Table,100050007010803_20140126_30801071
>> 7550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3
>> c5., storeName=CF, fileCount=1, fileSize=44.0 G, priority=6,
>> time=295643974900644; duration=6mins, 25sec |
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunn
>> er.run(CompactSplitThread.java:544)
>> 2016-05-30 01:41:58,529 | ERROR |
>> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 |
>> Compaction failed Request =
>> regionName=User_Namespace:User_Table,100050007010803_20140126_30801071
>> 7550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3
>> c5., storeName=CF, fileCount=1, fileSize=44.0 G, priority=6,
>> time=295643974900644 |
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunn
>> er.run(CompactSplitThread.java:563)
>> java.lang.IndexOutOfBoundsException
>> at java.nio.Buffer.checkIndex(Buffer.java:540)
>> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
>> at
>> org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:490)
>> at
>> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:349)
>> at
>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:512)
>> at
>> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1480)
>> at
>> org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:685)
>> at
>> org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126)
>> at
>> org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1986)
>> at
>> org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7914)
>> at
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestSplit(CompactSplitThread.java:240)
>> at
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:552)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> -
>> Observation:
>> >> HFilePrettyPrinter also print the message "Unable to retrieve the
>> midkey" for the mid key.
>> >> HDFS fsck report the hfile healthy.
>>
>> Though successful region compaction were also there, only few region
>> compaction failed with same error.
>>
>> Have anyone faced this issue? Any help will be appreciated.
>> HBase version is 1.0.0.
>>
>> Regards,
>> Pankaj
>>


RE: IndexOutOfBoundsException during retrieving region split point

2016-06-06 Thread Pankaj kr
Thanks Ted for replying. 
Yeah, We have a plan to upgrade. But currently I want to know the reason behind 
this. I tried to reproduce this in our test environment but didn’t happen.

in HFilePrettyPrinter output "numDataIndexLevels=2", so there were multilevel 
data index. Is which circumstances this problem can happen?

Regards,
Pankaj

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Saturday, June 04, 2016 12:16 AM
To: user@hbase.apache.org
Cc: bhupendra jain; Sharanabasappa G Keriwaddi
Subject: Re: IndexOutOfBoundsException during retrieving region split point

1.0.0 is quite old.

Is it possible to upgrade to 1.1 or 1.2 release ?

Thanks

On Fri, Jun 3, 2016 at 8:12 AM, Pankaj kr  wrote:

> Hi,
>
> We met a weird scenario in our production environment.
> IndexOutOfBoundsException is thrown while retrieving mid key of the 
> storefile after region compaction.
>
> Log Snippet :
> -
> 2016-05-30 01:41:58,484 | INFO  |
> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 | 
> Completed compaction of 1 (all) file(s) in CF of 
> User_Namespace:User_Table,100050007010803_20140126_308010717550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3c5.
> into eee1f433635d478197b212e2e378fce8(size=22.0 G), total size for 
> store is
> 22.0 G. This selection was in queue for 0sec, and took 6mins, 25sec to 
> execute. |
> org.apache.hadoop.hbase.regionserver.HStore.logCompactionEndMessage(HS
> tore.java:1356)
> 2016-05-30 01:41:58,485 | INFO  |
> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 | 
> Completed compaction: Request = 
> regionName=User_Namespace:User_Table,100050007010803_20140126_30801071
> 7550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3
> c5., storeName=CF, fileCount=1, fileSize=44.0 G, priority=6, 
> time=295643974900644; duration=6mins, 25sec |
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunn
> er.run(CompactSplitThread.java:544)
> 2016-05-30 01:41:58,529 | ERROR |
> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 | 
> Compaction failed Request = 
> regionName=User_Namespace:User_Table,100050007010803_20140126_30801071
> 7550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3
> c5., storeName=CF, fileCount=1, fileSize=44.0 G, priority=6,
> time=295643974900644 |
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunn
> er.run(CompactSplitThread.java:563)
> java.lang.IndexOutOfBoundsException
> at java.nio.Buffer.checkIndex(Buffer.java:540)
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
> at
> org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:490)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:349)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:512)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1480)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:685)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126)
> at
> org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1986)
> at
> org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7914)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestSplit(CompactSplitThread.java:240)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:552)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> -
> Observation:
> >> HFilePrettyPrinter also print the message "Unable to retrieve the
> midkey" for the mid key.
> >> HDFS fsck report the hfile healthy.
>
> Though successful region compaction were also there, only few region 
> compaction failed with same error.
>
> Have anyone faced this issue? Any help will be appreciated.
> HBase version is 1.0.0.
>
> Regards,
> Pankaj
>