RE: HBase namespaces and encryption
Thanks, I was aware of the issue with WALs but had not considered bulkload and archive. Roberta -Original Message- From: Esteban Gutierrez [mailto:este...@cloudera.com] Sent: Monday, March 13, 2017 10:50 AM To: user@hbase.apache.org Subject: Re: HBase namespaces and encryption Hello Roberta, I think there are too many caveats of using an encryption zone per namespace. Specially, bulkloading and the archive will fail; another problem is about the security guarantees that you are offering to your tenants, the WALs and oldWALs directories will have to be under the same encryption zone, not to mention that data from multiple tenants can potentially be written under the same WAL file and that could difficult to justify to your tenants. The simplest approach I can think of is to have 3 different HBase clusters and each one pointing to its own encryption zone on a different hbase.rootdir. I will be necessary to configure hbase.tmp.dir and zookeeper.znode.parent, port numbers, etc. but I think is much more simple than specifying an encryption zone per namespace. thanks! esteban. -- Cloudera, Inc. On Fri, Mar 10, 2017 at 3:11 PM, Roberta Martonwrote: > I have been researching how our product can use namespaces to aid in > multi-tenancy support. > For example, I have 3 tenants that need to be isolated from each other. > Ideally, each tenant would have its own namespace and its own set of > permissions applied. > What I would also like to do is integrate HDFS encryption with > namespaces. That is, each namespace would reside in its own encryption > zone and only be accessible through each zone's encryption key. > > Is this possible? From the documentation I have been reading, it is > recommended that all HBase data be in a single encryption zone. So this > would preclude the ability to create different zones for each namespace. > > If this is not possible, is there any plans to add this support in the > future? > > If we can't use HDFS encryption zones, is there any way to isolate each > tenants data through some other encryption mechanism? > >Regards, >Roberta Marton >
Re: limiting user threads on client
There are different thread pools in the client, and some of the thread pools depend on how are you constructing connection and table instances. The first thread pool is the one owned by the connection. If you are using ConnectionFactory.createConnection() (which you should) then this is the property that controls how many of the threads in the connection: hbase.hconnection.threads.max This one configures when the threads will be discarded: hbase.hconnection.threads.keepalivetime You can also give your own thread pool to the Connection object if you want to control threading behavior. If you are creating HTable or Table objects from Connection, then by default they share the same thread pool, so you do not have to do anything. Otherwise, the HTable objects can have their own thread pools as well. Then, there are RPC-level thread pools. In 1.x versions (unless you have netty based async RPC), there is one thread per regionserver that the client talks to. I don't think there is a limit of how many of these the client can have at a single time. So, if the client ends up doing RPCs to many servers, there will be one thread per server. You should use jstack or kill -3 to inspect the hbase client threads probably. Enis On Mon, Mar 13, 2017 at 2:57 PM, anil guptawrote: > I think you need to set that property before you make HBaseConfiguration > object. Have you tried that? > > On Mon, Mar 13, 2017 at 10:24 AM, Henning Blohm > wrote: > > > Unfortunately it doesn't seem to make a difference. > > > > I see that the configuration has hbase.htable.threads.max=1 right before > > setting up the Connection but then I still get hundreds of > > > > hconnection-*** > > > > threads. Is that actually Zookeeper? > > > > Thanks, > > Henning > > > > On 13.03.2017 17:28, Ted Yu wrote: > > > >> Are you using Java client ? > >> See the following in HTable : > >> > >>public static ThreadPoolExecutor getDefaultExecutor(Configuration > >> conf) { > >> > >> int maxThreads = conf.getInt("hbase.htable.threads.max", Integer. > >> MAX_VALUE); > >> > >> FYI > >> > >> On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm < > henning.bl...@zfabrik.de> > >> wrote: > >> > >> Hi, > >>> > >>> I am running an HBase client on a very resource limited machine. In > >>> particular numproc is limited so that I frequently get "Cannot create > >>> native thread" OOMs. I noticed that, in particular in write situations, > >>> the > >>> hconnection pool grows into the hundreds of threads - even when at most > >>> writing with less than ten application threads. Threads are discarded > >>> again > >>> after some minutes. > >>> > >>> In conjunction with other programs running on that machine, this > >>> sometimes > >>> leads to an "overload" situation. > >>> > >>> Is there a way to keep thread pool usage limited - or in some closer > >>> relation with the actual concurrency required? > >>> > >>> Thanks, > >>> > >>> Henning > >>> > >>> > >>> > >>> > > > > > -- > Thanks & Regards, > Anil Gupta >
Re: limiting user threads on client
I think you need to set that property before you make HBaseConfiguration object. Have you tried that? On Mon, Mar 13, 2017 at 10:24 AM, Henning Blohmwrote: > Unfortunately it doesn't seem to make a difference. > > I see that the configuration has hbase.htable.threads.max=1 right before > setting up the Connection but then I still get hundreds of > > hconnection-*** > > threads. Is that actually Zookeeper? > > Thanks, > Henning > > On 13.03.2017 17:28, Ted Yu wrote: > >> Are you using Java client ? >> See the following in HTable : >> >>public static ThreadPoolExecutor getDefaultExecutor(Configuration >> conf) { >> >> int maxThreads = conf.getInt("hbase.htable.threads.max", Integer. >> MAX_VALUE); >> >> FYI >> >> On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm >> wrote: >> >> Hi, >>> >>> I am running an HBase client on a very resource limited machine. In >>> particular numproc is limited so that I frequently get "Cannot create >>> native thread" OOMs. I noticed that, in particular in write situations, >>> the >>> hconnection pool grows into the hundreds of threads - even when at most >>> writing with less than ten application threads. Threads are discarded >>> again >>> after some minutes. >>> >>> In conjunction with other programs running on that machine, this >>> sometimes >>> leads to an "overload" situation. >>> >>> Is there a way to keep thread pool usage limited - or in some closer >>> relation with the actual concurrency required? >>> >>> Thanks, >>> >>> Henning >>> >>> >>> >>> > -- Thanks & Regards, Anil Gupta
Re: HBase namespaces and encryption
Hello Roberta, I think there are too many caveats of using an encryption zone per namespace. Specially, bulkloading and the archive will fail; another problem is about the security guarantees that you are offering to your tenants, the WALs and oldWALs directories will have to be under the same encryption zone, not to mention that data from multiple tenants can potentially be written under the same WAL file and that could difficult to justify to your tenants. The simplest approach I can think of is to have 3 different HBase clusters and each one pointing to its own encryption zone on a different hbase.rootdir. I will be necessary to configure hbase.tmp.dir and zookeeper.znode.parent, port numbers, etc. but I think is much more simple than specifying an encryption zone per namespace. thanks! esteban. -- Cloudera, Inc. On Fri, Mar 10, 2017 at 3:11 PM, Roberta Martonwrote: > I have been researching how our product can use namespaces to aid in > multi-tenancy support. > For example, I have 3 tenants that need to be isolated from each other. > Ideally, each tenant would have its own namespace and its own set of > permissions applied. > What I would also like to do is integrate HDFS encryption with > namespaces. That is, each namespace would reside in its own encryption > zone and only be accessible through each zone's encryption key. > > Is this possible? From the documentation I have been reading, it is > recommended that all HBase data be in a single encryption zone. So this > would preclude the ability to create different zones for each namespace. > > If this is not possible, is there any plans to add this support in the > future? > > If we can't use HDFS encryption zones, is there any way to isolate each > tenants data through some other encryption mechanism? > >Regards, >Roberta Marton >
Re: limiting user threads on client
Unfortunately it doesn't seem to make a difference. I see that the configuration has hbase.htable.threads.max=1 right before setting up the Connection but then I still get hundreds of hconnection-*** threads. Is that actually Zookeeper? Thanks, Henning On 13.03.2017 17:28, Ted Yu wrote: Are you using Java client ? See the following in HTable : public static ThreadPoolExecutor getDefaultExecutor(Configuration conf) { int maxThreads = conf.getInt("hbase.htable.threads.max", Integer. MAX_VALUE); FYI On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohmwrote: Hi, I am running an HBase client on a very resource limited machine. In particular numproc is limited so that I frequently get "Cannot create native thread" OOMs. I noticed that, in particular in write situations, the hconnection pool grows into the hundreds of threads - even when at most writing with less than ten application threads. Threads are discarded again after some minutes. In conjunction with other programs running on that machine, this sometimes leads to an "overload" situation. Is there a way to keep thread pool usage limited - or in some closer relation with the actual concurrency required? Thanks, Henning
Re: limiting user threads on client
It's that simple...? Thanks so much! Will give it a try right away. Thanks, Henning On 13.03.2017 17:28, Ted Yu wrote: Are you using Java client ? See the following in HTable : public static ThreadPoolExecutor getDefaultExecutor(Configuration conf) { int maxThreads = conf.getInt("hbase.htable.threads.max", Integer. MAX_VALUE); FYI On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohmwrote: Hi, I am running an HBase client on a very resource limited machine. In particular numproc is limited so that I frequently get "Cannot create native thread" OOMs. I noticed that, in particular in write situations, the hconnection pool grows into the hundreds of threads - even when at most writing with less than ten application threads. Threads are discarded again after some minutes. In conjunction with other programs running on that machine, this sometimes leads to an "overload" situation. Is there a way to keep thread pool usage limited - or in some closer relation with the actual concurrency required? Thanks, Henning
Re: limiting user threads on client
Are you using Java client ? See the following in HTable : public static ThreadPoolExecutor getDefaultExecutor(Configuration conf) { int maxThreads = conf.getInt("hbase.htable.threads.max", Integer. MAX_VALUE); FYI On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohmwrote: > Hi, > > I am running an HBase client on a very resource limited machine. In > particular numproc is limited so that I frequently get "Cannot create > native thread" OOMs. I noticed that, in particular in write situations, the > hconnection pool grows into the hundreds of threads - even when at most > writing with less than ten application threads. Threads are discarded again > after some minutes. > > In conjunction with other programs running on that machine, this sometimes > leads to an "overload" situation. > > Is there a way to keep thread pool usage limited - or in some closer > relation with the actual concurrency required? > > Thanks, > > Henning > > >
limiting user threads on client
Hi, I am running an HBase client on a very resource limited machine. In particular numproc is limited so that I frequently get "Cannot create native thread" OOMs. I noticed that, in particular in write situations, the hconnection pool grows into the hundreds of threads - even when at most writing with less than ten application threads. Threads are discarded again after some minutes. In conjunction with other programs running on that machine, this sometimes leads to an "overload" situation. Is there a way to keep thread pool usage limited - or in some closer relation with the actual concurrency required? Thanks, Henning
HBase + Spark join
Hi I want to join a Spark RDD with an HBase table. Im familiar with the different connectors available but couldn't find this functionality. The idea I have is to first sort the RDD according to a byte[] key [1] and rdd.mapPartitions so that I each partition contains a unique and sequentially sorted range of keys that lines up with the key order in HBase. I should mention that the RDD will always contain almost all the keys that are stored in HBase, so full tables scans are fine. Unfortunately, Spark cannot sort native Java byte[]. And i'm also not sure if mapPartitions really maintain the total sort order of the original RDD. Any suggestions? Cheers, -Kristoffer [1] Guava UnsignedBytes.lexicographicalComparator