Hi Ted and Vladimir, thanks! I was thinking if using index is a good idea. My scan/get criteria is something like "get all rows I inserted since end of yesterday". I may have to use MapReduce + timeRange filter.
Lars and all, I will try to report back some performance data later. Thanks for the help from you all. Best regards, Wei --------------------------------- Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan From: Ted Yu <[email protected]> To: "[email protected]" <[email protected]>, Date: 01/29/2014 04:37 PM Subject: Re: larger HFile block size for very wide row? bq. table:family2 holds only row keys (no data) from table:family1. Wei: You can designate family2 as essential column family so that family1 is brought into heap when needed. On Wed, Jan 29, 2014 at 1:33 PM, Vladimir Rodionov <[email protected]>wrote: > Yes, your row will be split by KV boundaries - no need to increase default > block size, except, probably, performance. > You will need to try different sizes to find optimal performance in your > use case. > I would not use combination of scan & get on the same table:family with > very large rows. > Either some kind of secondary indexing is needed or do scan on different > family (which has the same row keys) > > table:family1 holds original data > table:family2 holds only row keys (no data) from table:family1. > Your scan will be MUCH faster in this case. > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Wei Tan [[email protected]] > Sent: Wednesday, January 29, 2014 12:52 PM > To: [email protected] > Subject: Re: larger HFile block size for very wide row? > > Sorry, 1000 columns, each 2K, so each row is 2M. I guess HBase will keep a > single KV (i.e., a column rather than a row) in a block, so a row will > span multiple blocks? > > My scan pattern is: I will do range scan, find the matching row keys, and > fetch the whole row for each row that matches my criteria. > > Best regards, > Wei > > --------------------------------- > Wei Tan, PhD > Research Staff Member > IBM T. J. Watson Research Center > http://researcher.ibm.com/person/us-wtan > > > > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]>, > Date: 01/29/2014 03:49 PM > Subject: Re: larger HFile block size for very wide row? > > > > You 1000 columns? Not 1000k = 1m column, I assume. > So you'll have 2MB KVs. That's a bit on the large side. > > HBase will "grow" the block to fit the KV into it. It means you have > basically one block per KV. > I guess you address these rows via point gets (GET), and do not typically > scan through them, right? > > Do you see any performance issues? > > -- Lars > > > > ________________________________ > From: Wei Tan <[email protected]> > To: [email protected] > Sent: Wednesday, January 29, 2014 12:35 PM > Subject: larger HFile block size for very wide row? > > > Hi, I have a HBase table where each row has ~1000k columns, ~2K each. My > table scan pattern is to use a row key filter but I need to fetch the > whole row (~1000 k) columns back. > > Shall I set HFile block size to be larger than the default 64K? > Thanks, > Wei > > --------------------------------- > Wei Tan, PhD > Research Staff Member > IBM T. J. Watson Research Center > http://researcher.ibm.com/person/us-wtan > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
