You wouldn't do that if colocating MR. It is one way to soak up "extra" RAM on a large RAM box, although I'm not sure I would recommend it (I have no personal experience trying it, yet). For more on this where people are actively considering it, see https://issues.apache.org/jira/browse/BIGTOP-732
On Tue, Apr 30, 2013 at 11:14 AM, Michael Segel <[email protected]>wrote: > Multiple RS per host? > Huh? > > That seems very counter intuitive and potentially problematic w M/R jobs. > Could you expand on this? > > Thx > > -Mike > > On Apr 30, 2013, at 12:38 PM, Andrew Purtell <[email protected]> wrote: > > > Rules of thumb for starting off safely and for easing support issues are > > really good to have, but there are no hard barriers or singular > approaches: > > use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run > > multiple regionservers per host. It is going to depend on how the cluster > > is used and loaded. If we are talking about coprocessors, then effective > > limits are less clear, using a coprocessor to integrate an external > process > > implemented with native code communicating over memory mapped files in > > /dev/shm isn't outside what is possible (strawman alert). > > > > > > On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <[email protected] > >wrote: > > > >> Asaf, > >> > >> The heap barrier is something of a legend :) You can ask 10 different > >> HBase committers what they think the max heap is and get 10 different > >> answers. This is my take on heap sizes from the many clusters I have > dealt > >> with: > >> > >> 8GB -> Standard heap size, and tends to run fine without any tuning > >> > >> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends > >> cause churn(usually blockcache) > >> > >> 16GB -> GC tuning is a must, and now we need to start looking into MSLab > >> and ZK timeouts > >> > >> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise > the > >> ZK timeout a little higher > >> > >> 32GB -> We do have a couple people running this high, but the pain out > >> weighs the gains(IMHO) > >> > >> 64GB -> Let me know how it goes :) > >> > >> > >> > >> > >> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[email protected]> > >> wrote: > >> > >>> I don't wish to be rude, but you are making odd claims as fact as > >>> "mentioned in a couple of posts". It will be difficult to have a > serious > >>> conversation. I encourage you to test your hypotheses and let us know > if > >> in > >>> fact there is a JVM "heap barrier" (and where it may be). > >>> > >>> On Monday, April 29, 2013, Asaf Mesika wrote: > >>> > >>>> I think for Pheoenix truly to succeed, it's need HBase to break the > JVM > >>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots > >> of > >>>> analytics queries utilize memory, thus since its memory is shared with > >>>> HBase, there's so much you can do on 12GB heap. On the other hand, if > >>>> Pheonix was implemented outside HBase on the same machine (like Drill > >> or > >>>> Impala is doing), you can have 60GB for this process, running many > OLAP > >>>> queries in parallel, utilizing the same data set. > >>>> > >>>> > >>>> > >>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[email protected] > >>> <javascript:;>> > >>>> wrote: > >>>> > >>>>>> HBase is not really intended for heavy data crunching > >>>>> > >>>>> Yes it is. This is why we have first class MapReduce integration and > >>>>> optimized scanners. > >>>>> > >>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of > >>>> OLAP. > >>>>> > >>>>> Urban Airship's Datacube is an example of a successful OLAP project > >>>>> implemented on HBase: http://github.com/urbanairship/datacube > >>>>> > >>>>> "Urban Airship uses the datacube project to support its analytics > >> stack > >>>> for > >>>>> mobile apps. We handle about ~10K events per second per node." > >>>>> > >>>>> > >>>>> Also there is Adobe's SaasBase: > >>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe > >>>>> > >>>>> Etc. > >>>>> > >>>>> Where an HBase OLAP application will differ tremendously from a > >>>> traditional > >>>>> data warehouse is of course in the interface to the datastore. You > >> have > >>>> to > >>>>> design and speak in the language of the HBase API, though Phoenix ( > >>>>> https://github.com/forcedotcom/phoenix) is changing that. > >>>>> > >>>>> > >>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[email protected] > >>> <javascript:;> > >>>>> > >>>>> wrote: > >>>>> > >>>>>> Hi Kiran, > >>>>>> > >>>>>> In HBase the data is denormalized but at the core HBase is KeyValue > >>>> based > >>>>>> database meant for lookups or queries that expect response in > >>>>> milliseconds. > >>>>>> OLAP i.e. data warehouse usually involves heavy data crunching. > >> HBase > >>>> is > >>>>>> not really intended for heavy data crunching. If you want to just > >>> store > >>>>>> denoramlized data and do simple queries then HBase is good. For > >> OLAP > >>>> kind > >>>>>> of stuff, you can make HBase work but IMO you will be better off > >>> using > >>>>> Hive > >>>>>> for data warehousing. > >>>>>> > >>>>>> HTH, > >>>>>> Anil Gupta > >>>>>> > >>>>>> > >>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[email protected] > >>> <javascript:;>> > >>>> wrote: > >>>>>> > >>>>>>> But in HBase data can be said to be in denormalised state as the > >>>>>>> methodology > >>>>>>> used for storage is a (column family:column) based flexible > >> schema > >>>>> .Also, > >>>>>>> from Google's big table paper it is evident that HBase is > >> capable > >>> of > >>>>>> doing > >>>>>>> OLAP.SO where does the difference lie? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> View this message in context: > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html > >>>>>>> Sent from the HBase User mailing list archive at Nabble.com. > >>>>>>> > >>>>>> > >>>>> > >>>>> -- > >>>>> Best regards, > >>>>> > >>>>> - Andy > >>>>> > >>>>> Problems worthy of attack prove their worth by hitting back. - Piet > >>> Hein > >>>>> (via Tom White) > >>>>> > >>>> > >>> > >>> > >>> -- > >>> Best regards, > >>> > >>> - Andy > >>> > >>> Problems worthy of attack prove their worth by hitting back. - Piet > Hein > >>> (via Tom White) > >>> > >> > >> > >> > >> -- > >> Kevin O'Dell > >> Systems Engineer, Cloudera > >> > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
