Nevermind my last question. I thought BLOCKSIZE was a table attribute but it
is specific to a column family.
On Jul 27, 2010, at 1:19 PM, Andrew Nguyen wrote:
> I just attempted to change the blocksize and it doesn't seem to be taking. I
> am doing the following in the shell:
>
>> alter 'tablename', {METHOD=>'table_att', BLOCKSIZE=>1048576}
>
> It returns without any errors but when I 'describe' the table, the setting is
> unchanged. Is there another place for me to make such changes to the table?
>
> Thanks!
>
>
> On Jul 27, 2010, at 10:39 AM, Jean-Daniel Cryans wrote:
>
>> I would try using the smallest keys you can (row key, family name,
>> qualifier) as they are all stored with each value and you want to
>> retrieve millions of them very quickly. Also do the usual like LZO
>> compressing the families, and reading as few families as possible at
>> the same time (eg if your table has 3 families, you shouldn't read
>> more than 1 at a time).
>>
>> J-D
>>
>> On Tue, Jul 27, 2010 at 10:30 AM, Andrew Nguyen
>> <[email protected]> wrote:
>>> Perfect thanks, I will run some experiments and keep you posted.
>>>
>>> Aside from just getting elapsed time on scans of various sizes, are there
>>> any other tips on what sorts of measurements to perform? Also, since I'm
>>> doing the experiments with various block sizes anyways, any requests for
>>> other types of benchmarks?
>>>
>>> Thanks,
>>> Andrew
>>>
>>> On Jul 27, 2010, at 10:13 AM, Jean-Daniel Cryans wrote:
>>>
>>>>> Thanks for the heads up. Do you know what happens if I set this value
>>>>> larger than 5MB? We will always be scanning the data, and always in
>>>>> large blocks. I have yet to calculate the typical size of a single scan
>>>>> but imagine that it will usually be larger than 1MB.
>>>>
>>>> I never tried that, hard to tell, but always eager to hear about
>>>> others' experiences :)
>>>>
>>>>>
>>>>> Also, is there any way to change the block size with data already in
>>>>> HBase? Our current import process is very slow (preprocessing of the
>>>>> data) and we don't have the resources to store the preprocessed data.
>>>>
>>>> After altering the table, issue a major compaction on it and
>>>> everything will be re-written with the new block size.
>>>>
>>>> J-D
>>>
>>>
>