Tell me why your RS needs to be that large? (> 8 GB. ) I think the answer is that it depends. Especially when you start to add in coprocessors. I'm not saying that there are not legitimate reasons, but that a lot of time, people just up the heap size without thinking about the problem. To Kevin's point, when you exceed a certain point, you're going to need to really start to think about the tuning process.
MSLABs is now on by default or so I am told. -Just because you can do something doesn't mean its a good idea. ;-) On Apr 30, 2013, at 7:01 AM, Kevin O'dell <[email protected]> wrote: > Asaf, > > The heap barrier is something of a legend :) You can ask 10 different > HBase committers what they think the max heap is and get 10 different > answers. This is my take on heap sizes from the many clusters I have dealt > with: > > 8GB -> Standard heap size, and tends to run fine without any tuning > > 12GB -> Needs some TLC with regards to JVM tuning if your workload tends > cause churn(usually blockcache) > > 16GB -> GC tuning is a must, and now we need to start looking into MSLab > and ZK timeouts > > 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the > ZK timeout a little higher > > 32GB -> We do have a couple people running this high, but the pain out > weighs the gains(IMHO) > > 64GB -> Let me know how it goes :) > > > > > On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[email protected]> wrote: > >> I don't wish to be rude, but you are making odd claims as fact as >> "mentioned in a couple of posts". It will be difficult to have a serious >> conversation. I encourage you to test your hypotheses and let us know if in >> fact there is a JVM "heap barrier" (and where it may be). >> >> On Monday, April 29, 2013, Asaf Mesika wrote: >> >>> I think for Pheoenix truly to succeed, it's need HBase to break the JVM >>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of >>> analytics queries utilize memory, thus since its memory is shared with >>> HBase, there's so much you can do on 12GB heap. On the other hand, if >>> Pheonix was implemented outside HBase on the same machine (like Drill or >>> Impala is doing), you can have 60GB for this process, running many OLAP >>> queries in parallel, utilizing the same data set. >>> >>> >>> >>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[email protected] >> <javascript:;>> >>> wrote: >>> >>>>> HBase is not really intended for heavy data crunching >>>> >>>> Yes it is. This is why we have first class MapReduce integration and >>>> optimized scanners. >>>> >>>> Recent versions, like 0.94, also do pretty well with the 'O' part of >>> OLAP. >>>> >>>> Urban Airship's Datacube is an example of a successful OLAP project >>>> implemented on HBase: http://github.com/urbanairship/datacube >>>> >>>> "Urban Airship uses the datacube project to support its analytics stack >>> for >>>> mobile apps. We handle about ~10K events per second per node." >>>> >>>> >>>> Also there is Adobe's SaasBase: >>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe >>>> >>>> Etc. >>>> >>>> Where an HBase OLAP application will differ tremendously from a >>> traditional >>>> data warehouse is of course in the interface to the datastore. You have >>> to >>>> design and speak in the language of the HBase API, though Phoenix ( >>>> https://github.com/forcedotcom/phoenix) is changing that. >>>> >>>> >>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[email protected] >> <javascript:;> >>>> >>>> wrote: >>>> >>>>> Hi Kiran, >>>>> >>>>> In HBase the data is denormalized but at the core HBase is KeyValue >>> based >>>>> database meant for lookups or queries that expect response in >>>> milliseconds. >>>>> OLAP i.e. data warehouse usually involves heavy data crunching. HBase >>> is >>>>> not really intended for heavy data crunching. If you want to just >> store >>>>> denoramlized data and do simple queries then HBase is good. For OLAP >>> kind >>>>> of stuff, you can make HBase work but IMO you will be better off >> using >>>> Hive >>>>> for data warehousing. >>>>> >>>>> HTH, >>>>> Anil Gupta >>>>> >>>>> >>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[email protected] >> <javascript:;>> >>> wrote: >>>>> >>>>>> But in HBase data can be said to be in denormalised state as the >>>>>> methodology >>>>>> used for storage is a (column family:column) based flexible schema >>>> .Also, >>>>>> from Google's big table paper it is evident that HBase is capable >> of >>>>> doing >>>>>> OLAP.SO where does the difference lie? >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> >>>>> >>>> >>> >> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html >>>>>> Sent from the HBase User mailing list archive at Nabble.com. >>>>>> >>>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> - Andy >>>> >>>> Problems worthy of attack prove their worth by hitting back. - Piet >> Hein >>>> (via Tom White) >>>> >>> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> > > > > -- > Kevin O'Dell > Systems Engineer, Cloudera
