Hmmm I don't recommend HBase in situations where you are not running a M/R Framework. Sorry, as much as I love HBase, IMHO there are probably better solutions for a standalone NoSQL Databases. (YMMV depending on your use case.) The strength of HBase is that its part of the Hadoop Ecosystem.
I would think that it would probably be better to go virtual than go multi-region servers on bare hardware. You take a hit on I/O, but you can work around that too. But I'm conservative unless I have to get creative. ;-) But something to consider when white boarding ideas... On Apr 30, 2013, at 1:30 PM, Andrew Purtell <[email protected]> wrote: > You wouldn't do that if colocating MR. It is one way to soak up "extra" RAM > on a large RAM box, although I'm not sure I would recommend it (I have no > personal experience trying it, yet). For more on this where people are > actively considering it, see > https://issues.apache.org/jira/browse/BIGTOP-732 > > On Tue, Apr 30, 2013 at 11:14 AM, Michael Segel > <[email protected]>wrote: > >> Multiple RS per host? >> Huh? >> >> That seems very counter intuitive and potentially problematic w M/R jobs. >> Could you expand on this? >> >> Thx >> >> -Mike >> >> On Apr 30, 2013, at 12:38 PM, Andrew Purtell <[email protected]> wrote: >> >>> Rules of thumb for starting off safely and for easing support issues are >>> really good to have, but there are no hard barriers or singular >> approaches: >>> use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run >>> multiple regionservers per host. It is going to depend on how the cluster >>> is used and loaded. If we are talking about coprocessors, then effective >>> limits are less clear, using a coprocessor to integrate an external >> process >>> implemented with native code communicating over memory mapped files in >>> /dev/shm isn't outside what is possible (strawman alert). >>> >>> >>> On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <[email protected] >>> wrote: >>> >>>> Asaf, >>>> >>>> The heap barrier is something of a legend :) You can ask 10 different >>>> HBase committers what they think the max heap is and get 10 different >>>> answers. This is my take on heap sizes from the many clusters I have >> dealt >>>> with: >>>> >>>> 8GB -> Standard heap size, and tends to run fine without any tuning >>>> >>>> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends >>>> cause churn(usually blockcache) >>>> >>>> 16GB -> GC tuning is a must, and now we need to start looking into MSLab >>>> and ZK timeouts >>>> >>>> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise >> the >>>> ZK timeout a little higher >>>> >>>> 32GB -> We do have a couple people running this high, but the pain out >>>> weighs the gains(IMHO) >>>> >>>> 64GB -> Let me know how it goes :) >>>> >>>> >>>> >>>> >>>> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <[email protected]> >>>> wrote: >>>> >>>>> I don't wish to be rude, but you are making odd claims as fact as >>>>> "mentioned in a couple of posts". It will be difficult to have a >> serious >>>>> conversation. I encourage you to test your hypotheses and let us know >> if >>>> in >>>>> fact there is a JVM "heap barrier" (and where it may be). >>>>> >>>>> On Monday, April 29, 2013, Asaf Mesika wrote: >>>>> >>>>>> I think for Pheoenix truly to succeed, it's need HBase to break the >> JVM >>>>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots >>>> of >>>>>> analytics queries utilize memory, thus since its memory is shared with >>>>>> HBase, there's so much you can do on 12GB heap. On the other hand, if >>>>>> Pheonix was implemented outside HBase on the same machine (like Drill >>>> or >>>>>> Impala is doing), you can have 60GB for this process, running many >> OLAP >>>>>> queries in parallel, utilizing the same data set. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[email protected] >>>>> <javascript:;>> >>>>>> wrote: >>>>>> >>>>>>>> HBase is not really intended for heavy data crunching >>>>>>> >>>>>>> Yes it is. This is why we have first class MapReduce integration and >>>>>>> optimized scanners. >>>>>>> >>>>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of >>>>>> OLAP. >>>>>>> >>>>>>> Urban Airship's Datacube is an example of a successful OLAP project >>>>>>> implemented on HBase: http://github.com/urbanairship/datacube >>>>>>> >>>>>>> "Urban Airship uses the datacube project to support its analytics >>>> stack >>>>>> for >>>>>>> mobile apps. We handle about ~10K events per second per node." >>>>>>> >>>>>>> >>>>>>> Also there is Adobe's SaasBase: >>>>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe >>>>>>> >>>>>>> Etc. >>>>>>> >>>>>>> Where an HBase OLAP application will differ tremendously from a >>>>>> traditional >>>>>>> data warehouse is of course in the interface to the datastore. You >>>> have >>>>>> to >>>>>>> design and speak in the language of the HBase API, though Phoenix ( >>>>>>> https://github.com/forcedotcom/phoenix) is changing that. >>>>>>> >>>>>>> >>>>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[email protected] >>>>> <javascript:;> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Kiran, >>>>>>>> >>>>>>>> In HBase the data is denormalized but at the core HBase is KeyValue >>>>>> based >>>>>>>> database meant for lookups or queries that expect response in >>>>>>> milliseconds. >>>>>>>> OLAP i.e. data warehouse usually involves heavy data crunching. >>>> HBase >>>>>> is >>>>>>>> not really intended for heavy data crunching. If you want to just >>>>> store >>>>>>>> denoramlized data and do simple queries then HBase is good. For >>>> OLAP >>>>>> kind >>>>>>>> of stuff, you can make HBase work but IMO you will be better off >>>>> using >>>>>>> Hive >>>>>>>> for data warehousing. >>>>>>>> >>>>>>>> HTH, >>>>>>>> Anil Gupta >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[email protected] >>>>> <javascript:;>> >>>>>> wrote: >>>>>>>> >>>>>>>>> But in HBase data can be said to be in denormalised state as the >>>>>>>>> methodology >>>>>>>>> used for storage is a (column family:column) based flexible >>>> schema >>>>>>> .Also, >>>>>>>>> from Google's big table paper it is evident that HBase is >>>> capable >>>>> of >>>>>>>> doing >>>>>>>>> OLAP.SO where does the difference lie? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html >>>>>>>>> Sent from the HBase User mailing list archive at Nabble.com. >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> >>>>>>> - Andy >>>>>>> >>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet >>>>> Hein >>>>>>> (via Tom White) >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> - Andy >>>>> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet >> Hein >>>>> (via Tom White) >>>>> >>>> >>>> >>>> >>>> -- >>>> Kevin O'Dell >>>> Systems Engineer, Cloudera >>>> >>> >>> >>> >>> -- >>> Best regards, >>> >>> - Andy >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>> (via Tom White) >> >> > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White)
