I just attempted to change the blocksize and it doesn't seem to be taking. I
am doing the following in the shell:
> alter 'tablename', {METHOD=>'table_att', BLOCKSIZE=>1048576}
It returns without any errors but when I 'describe' the table, the setting is
unchanged. Is there another place for me to make such changes to the table?
Thanks!
On Jul 27, 2010, at 10:39 AM, Jean-Daniel Cryans wrote:
> I would try using the smallest keys you can (row key, family name,
> qualifier) as they are all stored with each value and you want to
> retrieve millions of them very quickly. Also do the usual like LZO
> compressing the families, and reading as few families as possible at
> the same time (eg if your table has 3 families, you shouldn't read
> more than 1 at a time).
>
> J-D
>
> On Tue, Jul 27, 2010 at 10:30 AM, Andrew Nguyen
> <[email protected]> wrote:
>> Perfect thanks, I will run some experiments and keep you posted.
>>
>> Aside from just getting elapsed time on scans of various sizes, are there
>> any other tips on what sorts of measurements to perform? Also, since I'm
>> doing the experiments with various block sizes anyways, any requests for
>> other types of benchmarks?
>>
>> Thanks,
>> Andrew
>>
>> On Jul 27, 2010, at 10:13 AM, Jean-Daniel Cryans wrote:
>>
>>>> Thanks for the heads up. Do you know what happens if I set this value
>>>> larger than 5MB? We will always be scanning the data, and always in large
>>>> blocks. I have yet to calculate the typical size of a single scan but
>>>> imagine that it will usually be larger than 1MB.
>>>
>>> I never tried that, hard to tell, but always eager to hear about
>>> others' experiences :)
>>>
>>>>
>>>> Also, is there any way to change the block size with data already in
>>>> HBase? Our current import process is very slow (preprocessing of the
>>>> data) and we don't have the resources to store the preprocessed data.
>>>
>>> After altering the table, issue a major compaction on it and
>>> everything will be re-written with the new block size.
>>>
>>> J-D
>>
>>