Principally I chose to use Amazon, because they are supposedly high performance, and what more important is: HBase is already set up if I chose it as an EMR Workflow. I wanted to save up the time setting up the cluster manually on EC2 instances.
Are you saying I will reach higher performance when I set up the HBase on the cluster manually, instead of the default Amazon HBase distribution? Or is it worth to tune the Amazon distribution with a bootstrap action? How long does it take, to set up the cluster with HDFS manually? I will also try larger instance types. On Thu, May 9, 2013 at 6:47 AM, Michel Segel <[email protected]>wrote: > With respect to EMR, you can run HBase fairly easily. > You can't run MapR w HBase on EMR stick w Amazon's release. > > And you can run it but you will want to know your tuning parameters up > front when you instantiate it. > > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 8, 2013, at 9:04 PM, Andrew Purtell <[email protected]> wrote: > > > M7 is not Apache HBase, or any HBase. It is a proprietary NoSQL datastore > > with (I gather) an Apache HBase compatible Java API. > > > > As for running HBase on EC2, we recently discussed some particulars, see > > the latter part of this thread: http://search-hadoop.com/m/rI1HpK90guwhere > > I hijack it. I wouldn't recommend launching HBase as part of an EMR flow > > unless you want to use it only for temporary random access storage, and > in > > which case use m2.2xlarge/m2.4xlarge instance types. Otherwise, set up a > > dedicated HBase backed storage service on high I/O instance types. The > > fundamental issue is IO performance on the EC2 platform is fair to poor. > > > > I have also noticed a large difference in baseline block device latency > if > > using an old Amazon Linux AMI (< 2013) or the latest AMIs from this year. > > Use the new ones, they cut the latency long tail in half. There were some > > significant kernel level improvements I gather. > > > > > > On Wed, May 8, 2013 at 10:42 AM, Marcos Luis Ortiz Valmaseda < > > [email protected]> wrote: > > > >> I think that you when you are talking about RMap, you are referring to > >> MapR´s distribution. > >> I think that MapR´s team released a very good version of its Hadoop > >> distribution focused on HBase called M7. You can see its overview here: > >> http://www.mapr.com/products/mapr-editions/m7-edition > >> > >> But this release was under beta testing, and I see that it´s not > included > >> in the Amazon Marketplace yet: > >> > >> > https://aws.amazon.com/marketplace/seller-profile?id=802b0a25-877e-4b57-9007-a3fd284815a5 > >> > >> > >> > >> > >> 2013/5/7 Pal Konyves <[email protected]> > >> > >>> Hi, > >>> > >>> Has anyone got some recommendations about running HBase on EC2? I am > >>> testing it, and so far I am very disappointed with it. I did not change > >>> anything about the default 'Amazon distribution' installation. It has > one > >>> MasterNode and two slave nodes, and write performance is around 2500 > >> small > >>> rows per sec at most, but I expected it to be way better. Oh, and this > >> is > >>> with batch put operations with autocommit turned off, where each batch > >>> containes about 500-1000 rows... When I do it with autocommit, it does > >> not > >>> even reach the 1000 rows per sec. > >>> > >>> Every nodes were m1.Large ones. > >>> > >>> Any experiences, suggestions? Is it worth to try the RMap distribution > >>> instead of the amazon one? > >>> > >>> Thanks, > >>> Pal > >> > >> > >> > >> -- > >> Marcos Ortiz Valmaseda > >> Product Manager at PDVSA > >> http://about.me/marcosortiz > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) >
