Multiple RS per host? Huh? That seems very counter intuitive and potentially problematic w M/R jobs. Could you expand on this?
Thx -Mike On Apr 30, 2013, at 12:38 PM, Andrew Purtell <[email protected]> wrote: > Rules of thumb for starting off safely and for easing support issues are > really good to have, but there are no hard barriers or singular approaches: > use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run > multiple regionservers per host. It is going to depend on how the cluster > is used and loaded. If we are talking about coprocessors, then effective > limits are less clear, using a coprocessor to integrate an external process > implemented with native code communicating over memory mapped files in > /dev/shm isn't outside what is possible (strawman alert). > > > On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <[email protected]>wrote: > >> Asaf, >> >> The heap barrier is something of a legend :) You can ask 10 different >> HBase committers what they think the max heap is and get 10 different >> answers. This is my take on heap sizes from the many clusters I have dealt >> with: >> >> 8GB -> Standard heap size, and tends to run fine without any tuning >> >> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends >> cause churn(usually blockcache) >> >> 16GB -> GC tuning is a must, and now we need to start looking into MSLab >> and ZK timeouts >> >> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the >> ZK timeout a little higher >> >> 32GB -> We do have a couple people running this high, but the pain out >> weighs the gains(IMHO) >> >> 64GB -> Let me know how it goes :) >> >> >> >> >> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[email protected]> >> wrote: >> >>> I don't wish to be rude, but you are making odd claims as fact as >>> "mentioned in a couple of posts". It will be difficult to have a serious >>> conversation. I encourage you to test your hypotheses and let us know if >> in >>> fact there is a JVM "heap barrier" (and where it may be). >>> >>> On Monday, April 29, 2013, Asaf Mesika wrote: >>> >>>> I think for Pheoenix truly to succeed, it's need HBase to break the JVM >>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots >> of >>>> analytics queries utilize memory, thus since its memory is shared with >>>> HBase, there's so much you can do on 12GB heap. On the other hand, if >>>> Pheonix was implemented outside HBase on the same machine (like Drill >> or >>>> Impala is doing), you can have 60GB for this process, running many OLAP >>>> queries in parallel, utilizing the same data set. >>>> >>>> >>>> >>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[email protected] >>> <javascript:;>> >>>> wrote: >>>> >>>>>> HBase is not really intended for heavy data crunching >>>>> >>>>> Yes it is. This is why we have first class MapReduce integration and >>>>> optimized scanners. >>>>> >>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of >>>> OLAP. >>>>> >>>>> Urban Airship's Datacube is an example of a successful OLAP project >>>>> implemented on HBase: http://github.com/urbanairship/datacube >>>>> >>>>> "Urban Airship uses the datacube project to support its analytics >> stack >>>> for >>>>> mobile apps. We handle about ~10K events per second per node." >>>>> >>>>> >>>>> Also there is Adobe's SaasBase: >>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe >>>>> >>>>> Etc. >>>>> >>>>> Where an HBase OLAP application will differ tremendously from a >>>> traditional >>>>> data warehouse is of course in the interface to the datastore. You >> have >>>> to >>>>> design and speak in the language of the HBase API, though Phoenix ( >>>>> https://github.com/forcedotcom/phoenix) is changing that. >>>>> >>>>> >>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[email protected] >>> <javascript:;> >>>>> >>>>> wrote: >>>>> >>>>>> Hi Kiran, >>>>>> >>>>>> In HBase the data is denormalized but at the core HBase is KeyValue >>>> based >>>>>> database meant for lookups or queries that expect response in >>>>> milliseconds. >>>>>> OLAP i.e. data warehouse usually involves heavy data crunching. >> HBase >>>> is >>>>>> not really intended for heavy data crunching. If you want to just >>> store >>>>>> denoramlized data and do simple queries then HBase is good. For >> OLAP >>>> kind >>>>>> of stuff, you can make HBase work but IMO you will be better off >>> using >>>>> Hive >>>>>> for data warehousing. >>>>>> >>>>>> HTH, >>>>>> Anil Gupta >>>>>> >>>>>> >>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[email protected] >>> <javascript:;>> >>>> wrote: >>>>>> >>>>>>> But in HBase data can be said to be in denormalised state as the >>>>>>> methodology >>>>>>> used for storage is a (column family:column) based flexible >> schema >>>>> .Also, >>>>>>> from Google's big table paper it is evident that HBase is >> capable >>> of >>>>>> doing >>>>>>> OLAP.SO where does the difference lie? >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> >>>>>> >>>>> >>>> >>> >> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html >>>>>>> Sent from the HBase User mailing list archive at Nabble.com. >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> - Andy >>>>> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet >>> Hein >>>>> (via Tom White) >>>>> >>>> >>> >>> >>> -- >>> Best regards, >>> >>> - Andy >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>> (via Tom White) >>> >> >> >> >> -- >> Kevin O'Dell >> Systems Engineer, Cloudera >> > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White)
