RE: HBase namespaces and encryption

2017-03-13 Thread Roberta Marton
Thanks, I was aware of the issue with WALs but had not considered bulkload and 
archive.

 Roberta

-Original Message-
From: Esteban Gutierrez [mailto:este...@cloudera.com] 
Sent: Monday, March 13, 2017 10:50 AM
To: user@hbase.apache.org
Subject: Re: HBase namespaces and encryption

Hello Roberta,

I think there are too many caveats of using an encryption zone per
namespace. Specially, bulkloading and the archive will fail; another
problem is about the security guarantees that you are offering to your
tenants, the WALs and oldWALs directories will have to be under the same
encryption zone, not to mention that data from multiple tenants can
potentially be written under the same WAL file and that could difficult to
justify to your tenants. The simplest approach I can think of is to have 3
different HBase clusters and each one pointing to its own encryption zone
on a different hbase.rootdir. I will be necessary to configure
hbase.tmp.dir and zookeeper.znode.parent, port numbers, etc. but I think is
much more simple than specifying an encryption zone per namespace.

thanks!
esteban.




--
Cloudera, Inc.


On Fri, Mar 10, 2017 at 3:11 PM, Roberta Marton 
wrote:

> I have been researching how our product can use namespaces to aid in
> multi-tenancy support.
> For example, I have 3 tenants that need to be isolated from each other.
> Ideally, each tenant would have its own namespace and  its own set of
> permissions applied.
> What I would also like to do is integrate HDFS encryption with
> namespaces.  That is, each namespace would reside in its own encryption
> zone and only be accessible through each zone's encryption key.
>
> Is this possible?  From the documentation I have been reading, it  is
> recommended that all HBase data be in a single encryption zone. So this
> would preclude the ability to create different zones for each namespace.
>
> If this is not possible, is there any plans to add this support in the
> future?
>
> If we can't use HDFS encryption zones, is there any way to isolate each
> tenants data through some other encryption mechanism?
>
>Regards,
>Roberta Marton
>


Re: limiting user threads on client

2017-03-13 Thread Enis Söztutar
There are different thread pools in the client, and some of the thread
pools depend on how are you constructing connection and table instances.

The first thread pool is the one owned by the connection. If you are using
ConnectionFactory.createConnection() (which you should) then this is the
property that controls how many of the threads in the connection:

hbase.hconnection.threads.max

This one configures when the threads will be discarded:

hbase.hconnection.threads.keepalivetime

You can also give your own thread pool to the Connection object if you want
to control threading behavior.
If you are creating HTable or Table objects from Connection, then by
default they share the same thread pool, so you do not have to do anything.
Otherwise, the HTable objects can have their own thread pools as well.

Then, there are RPC-level thread pools. In 1.x versions (unless you have
netty based async RPC), there is one thread per regionserver that the
client talks to. I don't think there is a limit of how many of these the
client can have at a single time. So, if the client ends up doing RPCs to
many servers, there will be one thread per server.

You should use jstack or kill -3 to inspect the hbase client threads
probably.

Enis
On Mon, Mar 13, 2017 at 2:57 PM, anil gupta  wrote:

> I think you need to set that property before you make HBaseConfiguration
> object. Have you tried that?
>
> On Mon, Mar 13, 2017 at 10:24 AM, Henning Blohm 
> wrote:
>
> > Unfortunately it doesn't seem to make a difference.
> >
> > I see that the configuration has hbase.htable.threads.max=1 right before
> > setting up the Connection but then I still get hundreds of
> >
> > hconnection-***
> >
> > threads. Is that actually Zookeeper?
> >
> > Thanks,
> > Henning
> >
> > On 13.03.2017 17:28, Ted Yu wrote:
> >
> >> Are you using Java client ?
> >> See the following in HTable :
> >>
> >>public static ThreadPoolExecutor getDefaultExecutor(Configuration
> >> conf) {
> >>
> >>  int maxThreads = conf.getInt("hbase.htable.threads.max", Integer.
> >> MAX_VALUE);
> >>
> >> FYI
> >>
> >> On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm <
> henning.bl...@zfabrik.de>
> >> wrote:
> >>
> >> Hi,
> >>>
> >>> I am running an HBase client on a very resource limited machine. In
> >>> particular numproc is limited so that I frequently get "Cannot create
> >>> native thread" OOMs. I noticed that, in particular in write situations,
> >>> the
> >>> hconnection pool grows into the hundreds of threads - even when at most
> >>> writing with less than ten application threads. Threads are discarded
> >>> again
> >>> after some minutes.
> >>>
> >>> In conjunction with other programs running on that machine, this
> >>> sometimes
> >>> leads to an "overload" situation.
> >>>
> >>> Is there a way to keep thread pool usage limited - or in some closer
> >>> relation with the actual concurrency required?
> >>>
> >>> Thanks,
> >>>
> >>> Henning
> >>>
> >>>
> >>>
> >>>
> >
>
>
> --
> Thanks & Regards,
> Anil Gupta
>


Re: limiting user threads on client

2017-03-13 Thread anil gupta
I think you need to set that property before you make HBaseConfiguration
object. Have you tried that?

On Mon, Mar 13, 2017 at 10:24 AM, Henning Blohm 
wrote:

> Unfortunately it doesn't seem to make a difference.
>
> I see that the configuration has hbase.htable.threads.max=1 right before
> setting up the Connection but then I still get hundreds of
>
> hconnection-***
>
> threads. Is that actually Zookeeper?
>
> Thanks,
> Henning
>
> On 13.03.2017 17:28, Ted Yu wrote:
>
>> Are you using Java client ?
>> See the following in HTable :
>>
>>public static ThreadPoolExecutor getDefaultExecutor(Configuration
>> conf) {
>>
>>  int maxThreads = conf.getInt("hbase.htable.threads.max", Integer.
>> MAX_VALUE);
>>
>> FYI
>>
>> On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm 
>> wrote:
>>
>> Hi,
>>>
>>> I am running an HBase client on a very resource limited machine. In
>>> particular numproc is limited so that I frequently get "Cannot create
>>> native thread" OOMs. I noticed that, in particular in write situations,
>>> the
>>> hconnection pool grows into the hundreds of threads - even when at most
>>> writing with less than ten application threads. Threads are discarded
>>> again
>>> after some minutes.
>>>
>>> In conjunction with other programs running on that machine, this
>>> sometimes
>>> leads to an "overload" situation.
>>>
>>> Is there a way to keep thread pool usage limited - or in some closer
>>> relation with the actual concurrency required?
>>>
>>> Thanks,
>>>
>>> Henning
>>>
>>>
>>>
>>>
>


-- 
Thanks & Regards,
Anil Gupta


Re: HBase namespaces and encryption

2017-03-13 Thread Esteban Gutierrez
Hello Roberta,

I think there are too many caveats of using an encryption zone per
namespace. Specially, bulkloading and the archive will fail; another
problem is about the security guarantees that you are offering to your
tenants, the WALs and oldWALs directories will have to be under the same
encryption zone, not to mention that data from multiple tenants can
potentially be written under the same WAL file and that could difficult to
justify to your tenants. The simplest approach I can think of is to have 3
different HBase clusters and each one pointing to its own encryption zone
on a different hbase.rootdir. I will be necessary to configure
hbase.tmp.dir and zookeeper.znode.parent, port numbers, etc. but I think is
much more simple than specifying an encryption zone per namespace.

thanks!
esteban.




--
Cloudera, Inc.


On Fri, Mar 10, 2017 at 3:11 PM, Roberta Marton 
wrote:

> I have been researching how our product can use namespaces to aid in
> multi-tenancy support.
> For example, I have 3 tenants that need to be isolated from each other.
> Ideally, each tenant would have its own namespace and  its own set of
> permissions applied.
> What I would also like to do is integrate HDFS encryption with
> namespaces.  That is, each namespace would reside in its own encryption
> zone and only be accessible through each zone's encryption key.
>
> Is this possible?  From the documentation I have been reading, it  is
> recommended that all HBase data be in a single encryption zone. So this
> would preclude the ability to create different zones for each namespace.
>
> If this is not possible, is there any plans to add this support in the
> future?
>
> If we can't use HDFS encryption zones, is there any way to isolate each
> tenants data through some other encryption mechanism?
>
>Regards,
>Roberta Marton
>


Re: limiting user threads on client

2017-03-13 Thread Henning Blohm

Unfortunately it doesn't seem to make a difference.

I see that the configuration has hbase.htable.threads.max=1 right before 
setting up the Connection but then I still get hundreds of


hconnection-***

threads. Is that actually Zookeeper?

Thanks,
Henning

On 13.03.2017 17:28, Ted Yu wrote:

Are you using Java client ?
See the following in HTable :

   public static ThreadPoolExecutor getDefaultExecutor(Configuration conf) {

 int maxThreads = conf.getInt("hbase.htable.threads.max", Integer.
MAX_VALUE);

FYI

On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm 
wrote:


Hi,

I am running an HBase client on a very resource limited machine. In
particular numproc is limited so that I frequently get "Cannot create
native thread" OOMs. I noticed that, in particular in write situations, the
hconnection pool grows into the hundreds of threads - even when at most
writing with less than ten application threads. Threads are discarded again
after some minutes.

In conjunction with other programs running on that machine, this sometimes
leads to an "overload" situation.

Is there a way to keep thread pool usage limited - or in some closer
relation with the actual concurrency required?

Thanks,

Henning







Re: limiting user threads on client

2017-03-13 Thread Henning Blohm

It's that simple...? Thanks so much! Will give it a try right away.

Thanks, Henning

On 13.03.2017 17:28, Ted Yu wrote:

Are you using Java client ?
See the following in HTable :

   public static ThreadPoolExecutor getDefaultExecutor(Configuration conf) {

 int maxThreads = conf.getInt("hbase.htable.threads.max", Integer.
MAX_VALUE);

FYI

On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm 
wrote:


Hi,

I am running an HBase client on a very resource limited machine. In
particular numproc is limited so that I frequently get "Cannot create
native thread" OOMs. I noticed that, in particular in write situations, the
hconnection pool grows into the hundreds of threads - even when at most
writing with less than ten application threads. Threads are discarded again
after some minutes.

In conjunction with other programs running on that machine, this sometimes
leads to an "overload" situation.

Is there a way to keep thread pool usage limited - or in some closer
relation with the actual concurrency required?

Thanks,

Henning







Re: limiting user threads on client

2017-03-13 Thread Ted Yu
Are you using Java client ?
See the following in HTable :

  public static ThreadPoolExecutor getDefaultExecutor(Configuration conf) {

int maxThreads = conf.getInt("hbase.htable.threads.max", Integer.
MAX_VALUE);

FYI

On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm 
wrote:

> Hi,
>
> I am running an HBase client on a very resource limited machine. In
> particular numproc is limited so that I frequently get "Cannot create
> native thread" OOMs. I noticed that, in particular in write situations, the
> hconnection pool grows into the hundreds of threads - even when at most
> writing with less than ten application threads. Threads are discarded again
> after some minutes.
>
> In conjunction with other programs running on that machine, this sometimes
> leads to an "overload" situation.
>
> Is there a way to keep thread pool usage limited - or in some closer
> relation with the actual concurrency required?
>
> Thanks,
>
> Henning
>
>
>


limiting user threads on client

2017-03-13 Thread Henning Blohm

Hi,

I am running an HBase client on a very resource limited machine. In 
particular numproc is limited so that I frequently get "Cannot create 
native thread" OOMs. I noticed that, in particular in write situations, 
the hconnection pool grows into the hundreds of threads - even when at 
most writing with less than ten application threads. Threads are 
discarded again after some minutes.


In conjunction with other programs running on that machine, this 
sometimes leads to an "overload" situation.


Is there a way to keep thread pool usage limited - or in some closer 
relation with the actual concurrency required?


Thanks,

Henning




HBase + Spark join

2017-03-13 Thread Kristoffer Sjögren
Hi

I want to join a Spark RDD with an HBase table. Im familiar with the
different connectors available but couldn't find this functionality.

The idea I have is to first sort the RDD according to a byte[] key [1]
and rdd.mapPartitions so that I each partition contains a unique and
sequentially sorted range of keys that lines up with the key order in
HBase.

I should mention that the RDD will always contain almost all the keys
that are stored in HBase, so full tables scans are fine.

Unfortunately, Spark cannot sort native Java byte[]. And i'm also not
sure if mapPartitions really maintain the total sort order of the
original RDD.

Any suggestions?

Cheers,
-Kristoffer

[1] Guava UnsignedBytes.lexicographicalComparator