Re: Writing visibility labels with HFileOutputFormat2

2016-06-16 Thread ramkrishna vasudevan
>>so long as only the HBase user and the spark user can read/write to the
file, I'm not sure what the risk is?
I was saying more with respect to the sensitivity of the data that was
written.
Say there are following users
Admin
Manager
Worker1
Worker 2

and the following labels
CONFIDENTIAL, SECRET, PUBLIC, WORKER_1_INFO, WORKER_2_INFO
Now if the manager has associated Worker 1 with WORKER_1_INFO and Worker 2
with WORKER_2_INFO. Now when worker1 is trying to read his information he
should set WORKER_1_INFO in his scan.

So if there is a bulk load scenario where the entire file is getting read
so the user trying to do the bulk load in this example should not be
worker1 or worker 2. It should be either the Admin or Manager.

Now in your case spark user and hbase user are these Admin or Manager (as
in my eg) then it is perfectly fine.

>>am I able to read the HFile manually to determine if Tags have been
written properly?
 HBASE-15707 is a case which was not allowing the tags to be written while
creating the file. You may be needing that fix when you are adding tags
directly. But in your case they are visibility tags which you are not
supposed to add directly except for using the setCellVisibility() way. But
it is better to have that fix in your branch also.

>>"hbase.security.visibility.mutations.checkauths" - for now the method of
set_auths 'client','system' along with only giving 'client' read on
'hbase:labels' is working for me.

Fine. I have some doubts on here with respect to how SYSTEM tags are
implemented. Will get back on this.

Regards
Ram

On Thu, Jun 16, 2016 at 9:11 PM, Ellis, Tom (Financial Markets IT) <
tom.el...@lloydsbanking.com.invalid> wrote:

> Hi Again Ram,
>
> "hbase.security.visibility.mutations.checkauths" - for now the method of
> set_auths 'client','system' along with only giving 'client' read on
> 'hbase:labels' is working for me.
>
> "Coming to reading the HFile and creating a bulk load, I think we should
> be more cautious here " - I don't follow again sorry. The spark user writes
> the HFile, and then initiates the load with
> LoadIncrementalHFiles.doBulkLoad - so long as only the HBase user and the
> spark user can read/write to the file, I'm not sure what the risk is?
>
> HBASE-15707 - am I able to read the HFile manually to determine if Tags
> have been written properly?
>
> Cheers,
>
> Tom
>
>
> -Original Message-
> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
> Sent: 16 June 2016 06:01
> To: user@hbase.apache.org
> Subject: Re: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> Thanks for the updates here. Going through the mails here
> >> Why is it that a client user without admin/super user privileges can
> >> set
> a visibility expression using Put.setCellVisibility, but if we want to
> write using HFiles,
>
> I get your point now. There is a property
> '"hbase.security.visibility.mutations.checkauths" if set will check if the
> user is authorized to mutate the visibility labels that he is trying to
> write. If the user is not allowed to add that label the mutation will fail.
> Can you see if this solves the other problem of allowing any client user
> to write? If the above is not well documented pls feel free to raise a JIRA
> and we are happy to address it.
>
> Coming to reading the HFile and creating a bulk load, I think we should be
> more cautious here. There are some critical info stored in the HFile and
> just allowing any user to read it is going to be risky.
>
> Coming to the PutSortReducer problem,  I think what you say is true. Not
> sure if there is a bug already, if not pls feel free to raise a bug here.
> We need to fix it.
>
>  HBASE-15707 - you may need this because for scala's HBasecontext you need
> to ensure tags are included just incase ImportTSV has to be used.
>
> Write back, if I had missed something or if my info was lacking. Its been
> quite sometime we had worked in this area so have to see code every time to
> know what was done.
>
> Regards
> Ram
>
> On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) <
> tom.el...@lloydsbanking.com.invalid> wrote:
>
> > So, I can see that I can correctly get the Lists from the
> > VisibilityExpressionResolver, set them on the Cell, and write them
> > using HFileOutputFormat2, however when I scan using an unprivileged
> > user I can still see the cells. If I write the cells with
> > setCellVisibility the unprivileged user can't see them.
> >
> > Then I noticed the fix for HBASE-15707. I am using the Hortonworks'
> > HBase
> > 1.1.2 - am affected by this/does HFileOutputFormat2 support tags
> > before this fix?
> >
> > Cheers,
> >
> > Tom Ellis
> > Consultant Developer – Excelian
> > Data Lake | Financial Markets IT
> > LLOYDS BANK COMMERCIAL BANKING
> >
> >
> > E: tom.el...@lloydsbanking.com
> > Website: www.lloydsbankcommercial.com
> > , , ,
> > Reduce printing. Lloyds Banking Group is helping t

Re:Re: maybe waste on blockCache

2016-06-16 Thread WangYQ


I set all user tables with blockCache on, but set the IN_MEMORY conf to false
 







At 2016-06-16 18:18:44, "Heng Chen"  wrote:
>bq. if we do not set any user tables IN_MEMORY to true, then the whole
>hbase just need to cache hbase:meta data to in_memory LruBlockCache.
>
>You set blockcache to be false for other tables?
>
>2016-06-16 16:21 GMT+08:00 WangYQ :
>
>> in hbase 0.98.10, if we use LruBlockCache, and set regionServer's max heap
>> to 10G
>> in default:
>> the size of in_memory priority of LruBlockCache is :
>> 10G * 0.4 * 0.25 = 1G
>>
>>
>> 0.4: hfile.block.cache.size
>> 0.25: hbase.lru.blockcache.memory.percentage
>>
>>
>> if we do not set any user tables IN_MEMORY to true, then the whole hbase
>> just need to cache hbase:meta data to in_memory LruBlockCache.
>> hbase:meta does not split , so just need one regionServer to cache, so
>> there is some waste in blockCache
>>
>>
>> i think the regionServer open hbase:meta need to set  in_memory
>> LruBlockCache to a certain size
>> other regionServer set hbase.lru.blockcache.memory.percentage to 0, do not
>> need to allocate  in_memory LruBlockCache.


Re: HBase number of columns

2016-06-16 Thread Saad Mufti
There is no real column schema in HBase other than defining the column
family, each write to a column writes a cell with the column name plus
value, so in theory number of columns doesn't really matter. What matters
is how much data you read and write.

That said there are settings in the column family schema for
DATA_BLOCK_ENCODING
that affect how much actual space each column/cell takes, FAST_DIFF is a
decent choice to make sure there is not too much redundancy by writing the
same column name over and over again if lots of rows have the same column
name. There are also compression settings of course.

Hope that helps.


Saad


On Wed, Jun 15, 2016 at 7:11 AM, Siddharth Ubale <
siddharth.ub...@syncoms.com> wrote:

> Hi,
>
> As per the official documentation of HBase it is mentioned that HBase
> typical schema should contain 1 to 3 column families per table (
> https://hbase.apache.org/book.html#table_schema_rules_of_thumb ) .
> However there is no mention of how many column qualifiers should a row
> contain for each column family to see good read & write performance.
> Could anybody let us know their input on how many columns per row is
> desirable in HBase or how many column qualifiers per column family would be
> desirable.
> Thanks,
> Siddharth Ubale,
>
>


RE: Writing visibility labels with HFileOutputFormat2

2016-06-16 Thread Ellis, Tom (Financial Markets IT)
Hi Again Ram,

"hbase.security.visibility.mutations.checkauths" - for now the method of 
set_auths 'client','system' along with only giving 'client' read on 
'hbase:labels' is working for me.

"Coming to reading the HFile and creating a bulk load, I think we should be 
more cautious here " - I don't follow again sorry. The spark user writes the 
HFile, and then initiates the load with LoadIncrementalHFiles.doBulkLoad - so 
long as only the HBase user and the spark user can read/write to the file, I'm 
not sure what the risk is?

HBASE-15707 - am I able to read the HFile manually to determine if Tags have 
been written properly?

Cheers,

Tom


-Original Message-
From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com]
Sent: 16 June 2016 06:01
To: user@hbase.apache.org
Subject: Re: Writing visibility labels with HFileOutputFormat2

-- This email has reached the Bank via an external source --


Thanks for the updates here. Going through the mails here
>> Why is it that a client user without admin/super user privileges can
>> set
a visibility expression using Put.setCellVisibility, but if we want to write 
using HFiles,

I get your point now. There is a property 
'"hbase.security.visibility.mutations.checkauths" if set will check if the user 
is authorized to mutate the visibility labels that he is trying to write. If 
the user is not allowed to add that label the mutation will fail.
Can you see if this solves the other problem of allowing any client user to 
write? If the above is not well documented pls feel free to raise a JIRA and we 
are happy to address it.

Coming to reading the HFile and creating a bulk load, I think we should be more 
cautious here. There are some critical info stored in the HFile and just 
allowing any user to read it is going to be risky.

Coming to the PutSortReducer problem,  I think what you say is true. Not sure 
if there is a bug already, if not pls feel free to raise a bug here.
We need to fix it.

 HBASE-15707 - you may need this because for scala's HBasecontext you need to 
ensure tags are included just incase ImportTSV has to be used.

Write back, if I had missed something or if my info was lacking. Its been quite 
sometime we had worked in this area so have to see code every time to know what 
was done.

Regards
Ram

On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) < 
tom.el...@lloydsbanking.com.invalid> wrote:

> So, I can see that I can correctly get the Lists from the
> VisibilityExpressionResolver, set them on the Cell, and write them
> using HFileOutputFormat2, however when I scan using an unprivileged
> user I can still see the cells. If I write the cells with
> setCellVisibility the unprivileged user can't see them.
>
> Then I noticed the fix for HBASE-15707. I am using the Hortonworks'
> HBase
> 1.1.2 - am affected by this/does HFileOutputFormat2 support tags
> before this fix?
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low
> carbon economy.
> Corporate Responsibility Report:
> www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: Ellis, Tom (Financial Markets IT) [mailto:
> tom.el...@lloydsbanking.com.INVALID]
> Sent: 15 June 2016 17:42
> To: user@hbase.apache.org
> Subject: RE: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> Looking at the source for how DefaultCellLabelServiceImpl checks
> authorisation I noted it's just that the user just needs to have the
> 'system' label auth privileges - not admin/super user as I thought you
> meant Ram. So technically, I could have a client user that is given
> the system label privileges, but only read access to the 'hbase:labels' table?
>
> Then that user will still be able to scan and read the labels +
> ordinal, and create the tags correctly :) I'll give it a go..
>
> Cheers,
>
> Tom Ellis
> Consultant Developer – Excelian
> Data Lake | Financial Markets IT
> LLOYDS BANK COMMERCIAL BANKING
>
>
> E: tom.el...@lloydsbanking.com
> Website: www.lloydsbankcommercial.com
> , , ,
> Reduce printing. Lloyds Banking Group is helping to build the low
> carbon economy.
> Corporate Responsibility Report:
> www.lloydsbankinggroup-cr.com/downloads
>
>
> -Original Message-
> From: Ellis, Tom (Financial Markets IT) [mailto:
> tom.el...@lloydsbanking.com.INVALID]
> Sent: 15 June 2016 16:56
> To: user@hbase.apache.org
> Subject: RE: Writing visibility labels with HFileOutputFormat2
>
> -- This email has reached the Bank via an external source --
>
>
> I see now from some other examples I've found that actually this form
> of using HFileOutputFormat2 to write Puts will use the PutSortReducer
> if you set the map output class of the job you give it to Put. Looking
>

Re: maybe waste on blockCache

2016-06-16 Thread Heng Chen
bq. if we do not set any user tables IN_MEMORY to true, then the whole
hbase just need to cache hbase:meta data to in_memory LruBlockCache.

You set blockcache to be false for other tables?

2016-06-16 16:21 GMT+08:00 WangYQ :

> in hbase 0.98.10, if we use LruBlockCache, and set regionServer's max heap
> to 10G
> in default:
> the size of in_memory priority of LruBlockCache is :
> 10G * 0.4 * 0.25 = 1G
>
>
> 0.4: hfile.block.cache.size
> 0.25: hbase.lru.blockcache.memory.percentage
>
>
> if we do not set any user tables IN_MEMORY to true, then the whole hbase
> just need to cache hbase:meta data to in_memory LruBlockCache.
> hbase:meta does not split , so just need one regionServer to cache, so
> there is some waste in blockCache
>
>
> i think the regionServer open hbase:meta need to set  in_memory
> LruBlockCache to a certain size
> other regionServer set hbase.lru.blockcache.memory.percentage to 0, do not
> need to allocate  in_memory LruBlockCache.


maybe waste on blockCache

2016-06-16 Thread WangYQ
in hbase 0.98.10, if we use LruBlockCache, and set regionServer's max heap to 
10G
in default: 
the size of in_memory priority of LruBlockCache is :
10G * 0.4 * 0.25 = 1G


0.4: hfile.block.cache.size
0.25: hbase.lru.blockcache.memory.percentage


if we do not set any user tables IN_MEMORY to true, then the whole hbase just 
need to cache hbase:meta data to in_memory LruBlockCache.
hbase:meta does not split , so just need one regionServer to cache, so there is 
some waste in blockCache


i think the regionServer open hbase:meta need to set  in_memory LruBlockCache 
to a certain size
other regionServer set hbase.lru.blockcache.memory.percentage to 0, do not need 
to allocate  in_memory LruBlockCache.