Re: all regions unregistered over time.

Stack Wed, 22 Sep 2010 11:12:30 -0700

Fair enough.
St.Ack


On Wed, Sep 22, 2010 at 11:09 AM, Jack Levin <[email protected]> wrote:
> Lzo of image data, which is already Jpeg?  Probably not a great idea, yes?
>
> -Jack
>
> On Wed, Sep 22, 2010 at 11:06 AM, Stack <[email protected]> wrote:
>> Are you lzo'ing Jack?  If not, you probably should.
>> St.Ack
>>
>> On Wed, Sep 22, 2010 at 3:17 AM, Jack Levin <[email protected]> wrote:
>>> So our cell sizes will be 350kb on average with 5-10 terabytes per server, 
>>> I just want to keep the count of Regions under 1000, per server
>>>
>>> -Jack
>>>
>>>
>>> On Sep 22, 2010, at 2:44 AM, Ryan Rawson <[email protected]> wrote:
>>>
>>>> Region size is one of those tricky things, there are a few factors to 
>>>> consider:
>>>>
>>>> - regions are the basic element of availability and distribution.
>>>> - HBase scales by having regions across many servers.  Thus if you
>>>> have 2 regions for 16GB data, on a 20 node machine you are a net loss
>>>> there.
>>>> - High region count has been known to make things slow, this is
>>>> getting better, but it is probably better to have 700 regions than
>>>> 3000 for the same amount of data.
>>>> - Low region count prevents parallel scalability as per point #2.
>>>> This really cant be stressed enough, since a common problem is loading
>>>> 200MB data into HBase then wondering why your awesome 10 node cluster
>>>> is mostly idle.
>>>> - There is not much memory footprint difference between 1 region and
>>>> 10 in terms of indexes, etc, held by the regionserver.
>>>>
>>>> Generally speaking I stick to the default, go smaller for hot tables,
>>>> or manually split them, and go with a 1GB region size on our largest
>>>> 900 GB table.
>>>>
>>>> -ryan
>>>>
>>>> On Wed, Sep 22, 2010 at 12:01 AM, Jack Levin <[email protected]> wrote:
>>>>> Yes, I am thinking to put 10 to 15 million files on each regionserver
>>>>> (well, not literally, but be controlled by regionserver).   So thats
>>>>> close to 4 TB worth of regions, which is about 4GB per region should
>>>>> we target 1000 regions per server.  Note, not all files are 'hot', and
>>>>> I only expect to keep about 1% super hot, and 5% relatively hot, the
>>>>> rest are cold.  So in terms of keeping hbase blocks in RAM, that
>>>>> should be adequate, for the rest we can afford a trip into hdfs.
>>>>>
>>>>> If servers are running 8 GB of ram, and are shared for regionservers
>>>>> and datanodes, how much heap should I allocate to each?  6GB for RS
>>>>> and 1GB  for DN?
>>>>>
>>>>> Also, on the question whether 8 core x 16G Ram helps a Master server
>>>>> to bring up the cluster faster, the answer is definitely - yes.   It
>>>>> took only 90 seconds to load 5000 regions across 13 servers, where
>>>>> same task for Dual Core 8G Ram, took nearly 10 minutes.
>>>>>
>>>>> -Jack
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 21, 2010 at 11:38 PM, Stack <[email protected]> wrote:
>>>>>> On Tue, Sep 21, 2010 at 11:11 PM, Jack Levin <[email protected]> wrote:
>>>>>>> Its definitely binary, and I can even load it in my browser but
>>>>>>> setting appropriate headers.  So I guess for PUT and GET via Accept:
>>>>>>> application/octet-stream there is no base64 encoding at all.
>>>>>>>
>>>>>>
>>>>>> OK.  Good.  If it were base64'd, you'd see it.
>>>>>>
>>>>>>> Btw, out of curiosity I have region max file size set to 1GB now, but
>>>>>>> what if I set it to say 10G or 50G?  Is their significant overhead in
>>>>>>> address seeking via HDFS?
>>>>>>>
>>>>>>
>>>>>> You could do that.  We don't have much experience running regions of
>>>>>> that size.  You should for sure pre-split your table on creation if
>>>>>> you go this route (See HBaseAdmin API [1]).  This method is not
>>>>>> available in shell so you'd have to script it or write a little java
>>>>>> to do it).
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>> 1. 
>>>>>> http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>>>>>> byte[][])
>>>>>>
>>>>>
>>>
>>
>

Re: all regions unregistered over time.

Reply via email to