Re: HBase minimum block size for sequential access

Andrew Nguyen Tue, 27 Jul 2010 13:26:47 -0700

Nevermind my last question.  I thought BLOCKSIZE was a table attribute but it 
is specific to a column family.



On Jul 27, 2010, at 1:19 PM, Andrew Nguyen wrote:

> I just attempted to change the blocksize and it doesn't seem to be taking.  I 
> am doing the following in the shell:
> 
>> alter 'tablename', {METHOD=>'table_att', BLOCKSIZE=>1048576}
> 
> It returns without any errors but when I 'describe' the table, the setting is 
> unchanged.  Is there another place for me to make such changes to the table?
> 
> Thanks!
> 
> 
> On Jul 27, 2010, at 10:39 AM, Jean-Daniel Cryans wrote:
> 
>> I would try using the smallest keys you can (row key, family name,
>> qualifier) as they are all stored with each value and you want to
>> retrieve millions of them very quickly. Also do the usual like LZO
>> compressing the families, and reading as few families as possible at
>> the same time (eg if your table has 3 families, you shouldn't read
>> more than 1 at a time).
>> 
>> J-D
>> 
>> On Tue, Jul 27, 2010 at 10:30 AM, Andrew Nguyen
>> <[email protected]> wrote:
>>> Perfect thanks, I will run some experiments and keep you posted.
>>> 
>>> Aside from just getting elapsed time on scans of various sizes, are there 
>>> any other tips on what sorts of measurements to perform?  Also, since I'm 
>>> doing the experiments with various block sizes anyways, any requests for 
>>> other types of benchmarks?
>>> 
>>> Thanks,
>>> Andrew
>>> 
>>> On Jul 27, 2010, at 10:13 AM, Jean-Daniel Cryans wrote:
>>> 
>>>>> Thanks for the heads up.  Do you know what happens if I set this value 
>>>>> larger than 5MB?  We will always be scanning the data, and always in 
>>>>> large blocks.  I have yet to calculate the typical size of a single scan 
>>>>> but imagine that it will usually be larger than 1MB.
>>>> 
>>>> I never tried that, hard to tell, but always eager to hear about
>>>> others' experiences :)
>>>> 
>>>>> 
>>>>> Also, is there any way to change the block size with data already in 
>>>>> HBase?  Our current import process is very slow (preprocessing of the 
>>>>> data) and we don't have the resources to store the preprocessed data.
>>>> 
>>>> After altering the table, issue a major compaction on it and
>>>> everything will be re-written with the new block size.
>>>> 
>>>> J-D
>>> 
>>> 
>

Re: HBase minimum block size for sequential access

Reply via email to