Re: HBase and Datawarehouse

Michael Segel Tue, 30 Apr 2013 06:18:06 -0700

Tell me why your RS needs to be that large?  (> 8 GB. )

I think the answer is that it depends. Especially when you start to add in 
coprocessors. 
I'm not saying that there are not legitimate reasons, but that a lot of time, 
people just up the heap size without thinking about the problem.
To Kevin's point, when you exceed a certain point, you're going to need to 
really start to think about the tuning process.


MSLABs is now on by default or so I am told. 

-Just because you can do something doesn't mean its a good idea. ;-) 

On Apr 30, 2013, at 7:01 AM, Kevin O'dell <[email protected]> wrote:

> Asaf,
> 
>  The heap barrier is something of a legend :)  You can ask 10 different
> HBase committers what they think the max heap is and get 10 different
> answers.  This is my take on heap sizes from the many clusters I have dealt
> with:
> 
> 8GB -> Standard heap size, and tends to run fine without any tuning
> 
> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
> cause churn(usually blockcache)
> 
> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
> and ZK timeouts
> 
> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
> ZK timeout a little higher
> 
> 32GB -> We do have a couple people running this high, but the pain out
> weighs the gains(IMHO)
> 
> 64GB -> Let me know how it goes :)
> 
> 
> 
> 
> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[email protected]> wrote:
> 
>> I don't wish to be rude, but you are making odd claims as fact as
>> "mentioned in a couple of posts". It will be difficult to have a serious
>> conversation. I encourage you to test your hypotheses and let us know if in
>> fact there is a JVM "heap barrier" (and where it may be).
>> 
>> On Monday, April 29, 2013, Asaf Mesika wrote:
>> 
>>> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
>>> analytics queries utilize memory, thus since its memory is shared with
>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
>>> Pheonix was implemented outside HBase on the same machine (like Drill or
>>> Impala is doing), you can have 60GB for this process, running many OLAP
>>> queries in parallel, utilizing the same data set.
>>> 
>>> 
>>> 
>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[email protected]
>> <javascript:;>>
>>> wrote:
>>> 
>>>>> HBase is not really intended for heavy data crunching
>>>> 
>>>> Yes it is. This is why we have first class MapReduce integration and
>>>> optimized scanners.
>>>> 
>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
>>> OLAP.
>>>> 
>>>> Urban Airship's Datacube is an example of a successful OLAP project
>>>> implemented on HBase: http://github.com/urbanairship/datacube
>>>> 
>>>> "Urban Airship uses the datacube project to support its analytics stack
>>> for
>>>> mobile apps. We handle about ~10K events per second per node."
>>>> 
>>>> 
>>>> Also there is Adobe's SaasBase:
>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>>> 
>>>> Etc.
>>>> 
>>>> Where an HBase OLAP application will differ tremendously from a
>>> traditional
>>>> data warehouse is of course in the interface to the datastore. You have
>>> to
>>>> design and speak in the language of the HBase API, though Phoenix (
>>>> https://github.com/forcedotcom/phoenix) is changing that.
>>>> 
>>>> 
>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[email protected]
>> <javascript:;>
>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Kiran,
>>>>> 
>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
>>> based
>>>>> database meant for lookups or queries that expect response in
>>>> milliseconds.
>>>>> OLAP i.e. data warehouse usually involves heavy data crunching. HBase
>>> is
>>>>> not really intended for heavy data crunching. If you want to just
>> store
>>>>> denoramlized data and do simple queries then HBase is good. For OLAP
>>> kind
>>>>> of stuff, you can make HBase work but IMO you will be better off
>> using
>>>> Hive
>>>>> for  data warehousing.
>>>>> 
>>>>> HTH,
>>>>> Anil Gupta
>>>>> 
>>>>> 
>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[email protected]
>> <javascript:;>>
>>> wrote:
>>>>> 
>>>>>> But in HBase data can be said to be in  denormalised state as the
>>>>>> methodology
>>>>>> used for storage is a (column family:column) based flexible schema
>>>> .Also,
>>>>>> from Google's  big table paper it is evident that HBase is capable
>> of
>>>>> doing
>>>>>> OLAP.SO where does the difference lie?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context:
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>>   - Andy
>>>> 
>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>> (via Tom White)
>>>> 
>>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>> 
> 
> 
> 
> -- 
> Kevin O'Dell
> Systems Engineer, Cloudera

Re: HBase and Datawarehouse

Reply via email to