RE: larger HFile block size for very wide row?

Vladimir Rodionov Wed, 29 Jan 2014 13:38:21 -0800

Yes, your row will be split by KV boundaries - no need to increase default 
block size, except, probably, performance.
You will need to try different sizes to find optimal performance in your use 
case.
I would not use combination of scan & get on the same table:family with very 
large rows.
Either some kind of secondary indexing is needed or do scan on different family 
(which has the same row keys)


table:family1 holds original data
table:family2 holds only row keys (no data) from  table:family1.
Your scan will be MUCH faster in this case.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [email protected]

________________________________________
From: Wei Tan [[email protected]]
Sent: Wednesday, January 29, 2014 12:52 PM
To: [email protected]
Subject: Re: larger HFile block size for very wide row?

Sorry, 1000 columns, each 2K, so each row is 2M. I guess HBase will keep a
single KV (i.e., a column rather than a row) in a block, so a row will
span multiple blocks?

My scan pattern is: I will do range scan, find the matching row keys, and
fetch the whole row for each row that matches my criteria.

Best regards,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan



From:   lars hofhansl <[email protected]>
To:     "[email protected]" <[email protected]>,
Date:   01/29/2014 03:49 PM
Subject:        Re: larger HFile block size for very wide row?



You 1000 columns? Not 1000k = 1m column, I assume.
So you'll have 2MB KVs. That's a bit on the large side.

HBase will "grow" the block to fit the KV into it. It means you have
basically one block per KV.
I guess you address these rows via point gets (GET), and do not typically
scan through them, right?

Do you see any performance issues?

-- Lars



________________________________
 From: Wei Tan <[email protected]>
To: [email protected]
Sent: Wednesday, January 29, 2014 12:35 PM
Subject: larger HFile block size for very wide row?


Hi, I have a HBase table where each row has ~1000k columns, ~2K each. My
table scan pattern is to use a row key filter but I need to fetch the
whole row (~1000 k) columns back.

Shall I set HFile block size to be larger than the default 64K?
Thanks,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or [email protected] and delete or destroy any 
copy of this message and its attachments.

RE: larger HFile block size for very wide row?

Reply via email to