Re: HBase and Datawarehouse

Andrew Purtell Tue, 30 Apr 2013 11:30:22 -0700

You wouldn't do that if colocating MR. It is one way to soak up "extra" RAM
on a large RAM box, although I'm not sure I would recommend it (I have no
personal experience trying it, yet). For more on this where people are
actively considering it, see
https://issues.apache.org/jira/browse/BIGTOP-732


On Tue, Apr 30, 2013 at 11:14 AM, Michael Segel
<[email protected]>wrote:

> Multiple RS per host?
> Huh?
>
> That seems very counter intuitive and potentially problematic w M/R jobs.
> Could you expand on this?
>
> Thx
>
> -Mike
>
> On Apr 30, 2013, at 12:38 PM, Andrew Purtell <[email protected]> wrote:
>
> > Rules of thumb for starting off safely and for easing support issues are
> > really good to have, but there are no hard barriers or singular
> approaches:
> > use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
> > multiple regionservers per host. It is going to depend on how the cluster
> > is used and loaded. If we are talking about coprocessors, then effective
> > limits are less clear, using a coprocessor to integrate an external
> process
> > implemented with native code communicating over memory mapped files in
> > /dev/shm isn't outside what is possible (strawman alert).
> >
> >
> > On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <[email protected]
> >wrote:
> >
> >> Asaf,
> >>
> >>  The heap barrier is something of a legend :)  You can ask 10 different
> >> HBase committers what they think the max heap is and get 10 different
> >> answers.  This is my take on heap sizes from the many clusters I have
> dealt
> >> with:
> >>
> >> 8GB -> Standard heap size, and tends to run fine without any tuning
> >>
> >> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
> >> cause churn(usually blockcache)
> >>
> >> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
> >> and ZK timeouts
> >>
> >> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise
> the
> >> ZK timeout a little higher
> >>
> >> 32GB -> We do have a couple people running this high, but the pain out
> >> weighs the gains(IMHO)
> >>
> >> 64GB -> Let me know how it goes :)
> >>
> >>
> >>
> >>
> >> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[email protected]>
> >> wrote:
> >>
> >>> I don't wish to be rude, but you are making odd claims as fact as
> >>> "mentioned in a couple of posts". It will be difficult to have a
> serious
> >>> conversation. I encourage you to test your hypotheses and let us know
> if
> >> in
> >>> fact there is a JVM "heap barrier" (and where it may be).
> >>>
> >>> On Monday, April 29, 2013, Asaf Mesika wrote:
> >>>
> >>>> I think for Pheoenix truly to succeed, it's need HBase to break the
> JVM
> >>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
> >> of
> >>>> analytics queries utilize memory, thus since its memory is shared with
> >>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
> >>>> Pheonix was implemented outside HBase on the same machine (like Drill
> >> or
> >>>> Impala is doing), you can have 60GB for this process, running many
> OLAP
> >>>> queries in parallel, utilizing the same data set.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[email protected]
> >>> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>>>> HBase is not really intended for heavy data crunching
> >>>>>
> >>>>> Yes it is. This is why we have first class MapReduce integration and
> >>>>> optimized scanners.
> >>>>>
> >>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
> >>>> OLAP.
> >>>>>
> >>>>> Urban Airship's Datacube is an example of a successful OLAP project
> >>>>> implemented on HBase: http://github.com/urbanairship/datacube
> >>>>>
> >>>>> "Urban Airship uses the datacube project to support its analytics
> >> stack
> >>>> for
> >>>>> mobile apps. We handle about ~10K events per second per node."
> >>>>>
> >>>>>
> >>>>> Also there is Adobe's SaasBase:
> >>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
> >>>>>
> >>>>> Etc.
> >>>>>
> >>>>> Where an HBase OLAP application will differ tremendously from a
> >>>> traditional
> >>>>> data warehouse is of course in the interface to the datastore. You
> >> have
> >>>> to
> >>>>> design and speak in the language of the HBase API, though Phoenix (
> >>>>> https://github.com/forcedotcom/phoenix) is changing that.
> >>>>>
> >>>>>
> >>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[email protected]
> >>> <javascript:;>
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Kiran,
> >>>>>>
> >>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
> >>>> based
> >>>>>> database meant for lookups or queries that expect response in
> >>>>> milliseconds.
> >>>>>> OLAP i.e. data warehouse usually involves heavy data crunching.
> >> HBase
> >>>> is
> >>>>>> not really intended for heavy data crunching. If you want to just
> >>> store
> >>>>>> denoramlized data and do simple queries then HBase is good. For
> >> OLAP
> >>>> kind
> >>>>>> of stuff, you can make HBase work but IMO you will be better off
> >>> using
> >>>>> Hive
> >>>>>> for  data warehousing.
> >>>>>>
> >>>>>> HTH,
> >>>>>> Anil Gupta
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[email protected]
> >>> <javascript:;>>
> >>>> wrote:
> >>>>>>
> >>>>>>> But in HBase data can be said to be in  denormalised state as the
> >>>>>>> methodology
> >>>>>>> used for storage is a (column family:column) based flexible
> >> schema
> >>>>> .Also,
> >>>>>>> from Google's  big table paper it is evident that HBase is
> >> capable
> >>> of
> >>>>>> doing
> >>>>>>> OLAP.SO where does the difference lie?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> View this message in context:
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> >>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>>
> >>>>>   - Andy
> >>>>>
> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >>> Hein
> >>>>> (via Tom White)
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>>   - Andy
> >>>
> >>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>> (via Tom White)
> >>>
> >>
> >>
> >>
> >> --
> >> Kevin O'Dell
> >> Systems Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase and Datawarehouse

Reply via email to