Re: Writing visibility labels with HFileOutputFormat2
>>so long as only the HBase user and the spark user can read/write to the file, I'm not sure what the risk is? I was saying more with respect to the sensitivity of the data that was written. Say there are following users Admin Manager Worker1 Worker 2 and the following labels CONFIDENTIAL, SECRET, PUBLIC, WORKER_1_INFO, WORKER_2_INFO Now if the manager has associated Worker 1 with WORKER_1_INFO and Worker 2 with WORKER_2_INFO. Now when worker1 is trying to read his information he should set WORKER_1_INFO in his scan. So if there is a bulk load scenario where the entire file is getting read so the user trying to do the bulk load in this example should not be worker1 or worker 2. It should be either the Admin or Manager. Now in your case spark user and hbase user are these Admin or Manager (as in my eg) then it is perfectly fine. >>am I able to read the HFile manually to determine if Tags have been written properly? HBASE-15707 is a case which was not allowing the tags to be written while creating the file. You may be needing that fix when you are adding tags directly. But in your case they are visibility tags which you are not supposed to add directly except for using the setCellVisibility() way. But it is better to have that fix in your branch also. >>"hbase.security.visibility.mutations.checkauths" - for now the method of set_auths 'client','system' along with only giving 'client' read on 'hbase:labels' is working for me. Fine. I have some doubts on here with respect to how SYSTEM tags are implemented. Will get back on this. Regards Ram On Thu, Jun 16, 2016 at 9:11 PM, Ellis, Tom (Financial Markets IT) < tom.el...@lloydsbanking.com.invalid> wrote: > Hi Again Ram, > > "hbase.security.visibility.mutations.checkauths" - for now the method of > set_auths 'client','system' along with only giving 'client' read on > 'hbase:labels' is working for me. > > "Coming to reading the HFile and creating a bulk load, I think we should > be more cautious here " - I don't follow again sorry. The spark user writes > the HFile, and then initiates the load with > LoadIncrementalHFiles.doBulkLoad - so long as only the HBase user and the > spark user can read/write to the file, I'm not sure what the risk is? > > HBASE-15707 - am I able to read the HFile manually to determine if Tags > have been written properly? > > Cheers, > > Tom > > > -Original Message- > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] > Sent: 16 June 2016 06:01 > To: user@hbase.apache.org > Subject: Re: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > Thanks for the updates here. Going through the mails here > >> Why is it that a client user without admin/super user privileges can > >> set > a visibility expression using Put.setCellVisibility, but if we want to > write using HFiles, > > I get your point now. There is a property > '"hbase.security.visibility.mutations.checkauths" if set will check if the > user is authorized to mutate the visibility labels that he is trying to > write. If the user is not allowed to add that label the mutation will fail. > Can you see if this solves the other problem of allowing any client user > to write? If the above is not well documented pls feel free to raise a JIRA > and we are happy to address it. > > Coming to reading the HFile and creating a bulk load, I think we should be > more cautious here. There are some critical info stored in the HFile and > just allowing any user to read it is going to be risky. > > Coming to the PutSortReducer problem, I think what you say is true. Not > sure if there is a bug already, if not pls feel free to raise a bug here. > We need to fix it. > > HBASE-15707 - you may need this because for scala's HBasecontext you need > to ensure tags are included just incase ImportTSV has to be used. > > Write back, if I had missed something or if my info was lacking. Its been > quite sometime we had worked in this area so have to see code every time to > know what was done. > > Regards > Ram > > On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) < > tom.el...@lloydsbanking.com.invalid> wrote: > > > So, I can see that I can correctly get the Lists from the > > VisibilityExpressionResolver, set them on the Cell, and write them > > using HFileOutputFormat2, however when I scan using an unprivileged > > user I can still see the cells. If I write the cells with > > setCellVisibility the unprivileged user can't see them. > > > > Then I noticed the fix for HBASE-15707. I am using the Hortonworks' > > HBase > > 1.1.2 - am affected by this/does HFileOutputFormat2 support tags > > before this fix? > > > > Cheers, > > > > Tom Ellis > > Consultant Developer – Excelian > > Data Lake | Financial Markets IT > > LLOYDS BANK COMMERCIAL BANKING > > > > > > E: tom.el...@lloydsbanking.com > > Website: www.lloydsbankcommercial.com > > , , , > > Reduce printing. Lloyds Banking Group is helping t
Re:Re: maybe waste on blockCache
I set all user tables with blockCache on, but set the IN_MEMORY conf to false At 2016-06-16 18:18:44, "Heng Chen" wrote: >bq. if we do not set any user tables IN_MEMORY to true, then the whole >hbase just need to cache hbase:meta data to in_memory LruBlockCache. > >You set blockcache to be false for other tables? > >2016-06-16 16:21 GMT+08:00 WangYQ : > >> in hbase 0.98.10, if we use LruBlockCache, and set regionServer's max heap >> to 10G >> in default: >> the size of in_memory priority of LruBlockCache is : >> 10G * 0.4 * 0.25 = 1G >> >> >> 0.4: hfile.block.cache.size >> 0.25: hbase.lru.blockcache.memory.percentage >> >> >> if we do not set any user tables IN_MEMORY to true, then the whole hbase >> just need to cache hbase:meta data to in_memory LruBlockCache. >> hbase:meta does not split , so just need one regionServer to cache, so >> there is some waste in blockCache >> >> >> i think the regionServer open hbase:meta need to set in_memory >> LruBlockCache to a certain size >> other regionServer set hbase.lru.blockcache.memory.percentage to 0, do not >> need to allocate in_memory LruBlockCache.
Re: HBase number of columns
There is no real column schema in HBase other than defining the column family, each write to a column writes a cell with the column name plus value, so in theory number of columns doesn't really matter. What matters is how much data you read and write. That said there are settings in the column family schema for DATA_BLOCK_ENCODING that affect how much actual space each column/cell takes, FAST_DIFF is a decent choice to make sure there is not too much redundancy by writing the same column name over and over again if lots of rows have the same column name. There are also compression settings of course. Hope that helps. Saad On Wed, Jun 15, 2016 at 7:11 AM, Siddharth Ubale < siddharth.ub...@syncoms.com> wrote: > Hi, > > As per the official documentation of HBase it is mentioned that HBase > typical schema should contain 1 to 3 column families per table ( > https://hbase.apache.org/book.html#table_schema_rules_of_thumb ) . > However there is no mention of how many column qualifiers should a row > contain for each column family to see good read & write performance. > Could anybody let us know their input on how many columns per row is > desirable in HBase or how many column qualifiers per column family would be > desirable. > Thanks, > Siddharth Ubale, > >
RE: Writing visibility labels with HFileOutputFormat2
Hi Again Ram, "hbase.security.visibility.mutations.checkauths" - for now the method of set_auths 'client','system' along with only giving 'client' read on 'hbase:labels' is working for me. "Coming to reading the HFile and creating a bulk load, I think we should be more cautious here " - I don't follow again sorry. The spark user writes the HFile, and then initiates the load with LoadIncrementalHFiles.doBulkLoad - so long as only the HBase user and the spark user can read/write to the file, I'm not sure what the risk is? HBASE-15707 - am I able to read the HFile manually to determine if Tags have been written properly? Cheers, Tom -Original Message- From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] Sent: 16 June 2016 06:01 To: user@hbase.apache.org Subject: Re: Writing visibility labels with HFileOutputFormat2 -- This email has reached the Bank via an external source -- Thanks for the updates here. Going through the mails here >> Why is it that a client user without admin/super user privileges can >> set a visibility expression using Put.setCellVisibility, but if we want to write using HFiles, I get your point now. There is a property '"hbase.security.visibility.mutations.checkauths" if set will check if the user is authorized to mutate the visibility labels that he is trying to write. If the user is not allowed to add that label the mutation will fail. Can you see if this solves the other problem of allowing any client user to write? If the above is not well documented pls feel free to raise a JIRA and we are happy to address it. Coming to reading the HFile and creating a bulk load, I think we should be more cautious here. There are some critical info stored in the HFile and just allowing any user to read it is going to be risky. Coming to the PutSortReducer problem, I think what you say is true. Not sure if there is a bug already, if not pls feel free to raise a bug here. We need to fix it. HBASE-15707 - you may need this because for scala's HBasecontext you need to ensure tags are included just incase ImportTSV has to be used. Write back, if I had missed something or if my info was lacking. Its been quite sometime we had worked in this area so have to see code every time to know what was done. Regards Ram On Wed, Jun 15, 2016 at 11:29 PM, Ellis, Tom (Financial Markets IT) < tom.el...@lloydsbanking.com.invalid> wrote: > So, I can see that I can correctly get the Lists from the > VisibilityExpressionResolver, set them on the Cell, and write them > using HFileOutputFormat2, however when I scan using an unprivileged > user I can still see the cells. If I write the cells with > setCellVisibility the unprivileged user can't see them. > > Then I noticed the fix for HBASE-15707. I am using the Hortonworks' > HBase > 1.1.2 - am affected by this/does HFileOutputFormat2 support tags > before this fix? > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low > carbon economy. > Corporate Responsibility Report: > www.lloydsbankinggroup-cr.com/downloads > > > -Original Message- > From: Ellis, Tom (Financial Markets IT) [mailto: > tom.el...@lloydsbanking.com.INVALID] > Sent: 15 June 2016 17:42 > To: user@hbase.apache.org > Subject: RE: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > Looking at the source for how DefaultCellLabelServiceImpl checks > authorisation I noted it's just that the user just needs to have the > 'system' label auth privileges - not admin/super user as I thought you > meant Ram. So technically, I could have a client user that is given > the system label privileges, but only read access to the 'hbase:labels' table? > > Then that user will still be able to scan and read the labels + > ordinal, and create the tags correctly :) I'll give it a go.. > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low > carbon economy. > Corporate Responsibility Report: > www.lloydsbankinggroup-cr.com/downloads > > > -Original Message- > From: Ellis, Tom (Financial Markets IT) [mailto: > tom.el...@lloydsbanking.com.INVALID] > Sent: 15 June 2016 16:56 > To: user@hbase.apache.org > Subject: RE: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > > I see now from some other examples I've found that actually this form > of using HFileOutputFormat2 to write Puts will use the PutSortReducer > if you set the map output class of the job you give it to Put. Looking >
Re: maybe waste on blockCache
bq. if we do not set any user tables IN_MEMORY to true, then the whole hbase just need to cache hbase:meta data to in_memory LruBlockCache. You set blockcache to be false for other tables? 2016-06-16 16:21 GMT+08:00 WangYQ : > in hbase 0.98.10, if we use LruBlockCache, and set regionServer's max heap > to 10G > in default: > the size of in_memory priority of LruBlockCache is : > 10G * 0.4 * 0.25 = 1G > > > 0.4: hfile.block.cache.size > 0.25: hbase.lru.blockcache.memory.percentage > > > if we do not set any user tables IN_MEMORY to true, then the whole hbase > just need to cache hbase:meta data to in_memory LruBlockCache. > hbase:meta does not split , so just need one regionServer to cache, so > there is some waste in blockCache > > > i think the regionServer open hbase:meta need to set in_memory > LruBlockCache to a certain size > other regionServer set hbase.lru.blockcache.memory.percentage to 0, do not > need to allocate in_memory LruBlockCache.
maybe waste on blockCache
in hbase 0.98.10, if we use LruBlockCache, and set regionServer's max heap to 10G in default: the size of in_memory priority of LruBlockCache is : 10G * 0.4 * 0.25 = 1G 0.4: hfile.block.cache.size 0.25: hbase.lru.blockcache.memory.percentage if we do not set any user tables IN_MEMORY to true, then the whole hbase just need to cache hbase:meta data to in_memory LruBlockCache. hbase:meta does not split , so just need one regionServer to cache, so there is some waste in blockCache i think the regionServer open hbase:meta need to set in_memory LruBlockCache to a certain size other regionServer set hbase.lru.blockcache.memory.percentage to 0, do not need to allocate in_memory LruBlockCache.