Re: HBase and Datawarehouse

Michael Segel Tue, 30 Apr 2013 11:15:26 -0700

Multiple RS per host? 
Huh? 

That seems very counter intuitive and potentially problematic w M/R jobs. 
Could you expand on this?


Thx

-Mike

On Apr 30, 2013, at 12:38 PM, Andrew Purtell <[email protected]> wrote:

> Rules of thumb for starting off safely and for easing support issues are
> really good to have, but there are no hard barriers or singular approaches:
> use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
> multiple regionservers per host. It is going to depend on how the cluster
> is used and loaded. If we are talking about coprocessors, then effective
> limits are less clear, using a coprocessor to integrate an external process
> implemented with native code communicating over memory mapped files in
> /dev/shm isn't outside what is possible (strawman alert).
> 
> 
> On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <[email protected]>wrote:
> 
>> Asaf,
>> 
>>  The heap barrier is something of a legend :)  You can ask 10 different
>> HBase committers what they think the max heap is and get 10 different
>> answers.  This is my take on heap sizes from the many clusters I have dealt
>> with:
>> 
>> 8GB -> Standard heap size, and tends to run fine without any tuning
>> 
>> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
>> cause churn(usually blockcache)
>> 
>> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
>> and ZK timeouts
>> 
>> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
>> ZK timeout a little higher
>> 
>> 32GB -> We do have a couple people running this high, but the pain out
>> weighs the gains(IMHO)
>> 
>> 64GB -> Let me know how it goes :)
>> 
>> 
>> 
>> 
>> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[email protected]>
>> wrote:
>> 
>>> I don't wish to be rude, but you are making odd claims as fact as
>>> "mentioned in a couple of posts". It will be difficult to have a serious
>>> conversation. I encourage you to test your hypotheses and let us know if
>> in
>>> fact there is a JVM "heap barrier" (and where it may be).
>>> 
>>> On Monday, April 29, 2013, Asaf Mesika wrote:
>>> 
>>>> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
>>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
>> of
>>>> analytics queries utilize memory, thus since its memory is shared with
>>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
>>>> Pheonix was implemented outside HBase on the same machine (like Drill
>> or
>>>> Impala is doing), you can have 60GB for this process, running many OLAP
>>>> queries in parallel, utilizing the same data set.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[email protected]
>>> <javascript:;>>
>>>> wrote:
>>>> 
>>>>>> HBase is not really intended for heavy data crunching
>>>>> 
>>>>> Yes it is. This is why we have first class MapReduce integration and
>>>>> optimized scanners.
>>>>> 
>>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
>>>> OLAP.
>>>>> 
>>>>> Urban Airship's Datacube is an example of a successful OLAP project
>>>>> implemented on HBase: http://github.com/urbanairship/datacube
>>>>> 
>>>>> "Urban Airship uses the datacube project to support its analytics
>> stack
>>>> for
>>>>> mobile apps. We handle about ~10K events per second per node."
>>>>> 
>>>>> 
>>>>> Also there is Adobe's SaasBase:
>>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>>>> 
>>>>> Etc.
>>>>> 
>>>>> Where an HBase OLAP application will differ tremendously from a
>>>> traditional
>>>>> data warehouse is of course in the interface to the datastore. You
>> have
>>>> to
>>>>> design and speak in the language of the HBase API, though Phoenix (
>>>>> https://github.com/forcedotcom/phoenix) is changing that.
>>>>> 
>>>>> 
>>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[email protected]
>>> <javascript:;>
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi Kiran,
>>>>>> 
>>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
>>>> based
>>>>>> database meant for lookups or queries that expect response in
>>>>> milliseconds.
>>>>>> OLAP i.e. data warehouse usually involves heavy data crunching.
>> HBase
>>>> is
>>>>>> not really intended for heavy data crunching. If you want to just
>>> store
>>>>>> denoramlized data and do simple queries then HBase is good. For
>> OLAP
>>>> kind
>>>>>> of stuff, you can make HBase work but IMO you will be better off
>>> using
>>>>> Hive
>>>>>> for  data warehousing.
>>>>>> 
>>>>>> HTH,
>>>>>> Anil Gupta
>>>>>> 
>>>>>> 
>>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[email protected]
>>> <javascript:;>>
>>>> wrote:
>>>>>> 
>>>>>>> But in HBase data can be said to be in  denormalised state as the
>>>>>>> methodology
>>>>>>> used for storage is a (column family:column) based flexible
>> schema
>>>>> .Also,
>>>>>>> from Google's  big table paper it is evident that HBase is
>> capable
>>> of
>>>>>> doing
>>>>>>> OLAP.SO where does the difference lie?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
>>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> 
>>>>>   - Andy
>>>>> 
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>>>> (via Tom White)
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>> 
>> 
>> 
>> 
>> --
>> Kevin O'Dell
>> Systems Engineer, Cloudera
>> 
> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: HBase and Datawarehouse

Reply via email to