no no, 20 GB heap per node. each node with 24-32gb ram, etc. we cant rely on the linux buffer cache to save us, so we have to cache in hbase ram.
:-) -ryan On Mon, Sep 20, 2010 at 9:44 PM, Jack Levin <[email protected]> wrote: > 20GB+?, hmmm..... I do plan to run 50 regionserver nodes though, with > 3 GB Heap likely, this should be plenty to rip through say, 350TB of > data. > > -Jack > > On Mon, Sep 20, 2010 at 9:39 PM, Ryan Rawson <[email protected]> wrote: >> yes that is the new ZK based coordination. when i publish the SU code >> we have a patch which limits that and is faster. 2GB is a little >> small for a regionserver memory... in my ideal world we'll be putting >> 20GB+ of ram to regionserver. >> >> I just figured you were using the DEB/RPMs because your files were in >> /usr/local... I usually run everything out of /home/hadoop b/c it >> allows me to easily rsync as user hadoop. >> >> but you are on the right track yes :-) >> >> On Mon, Sep 20, 2010 at 9:32 PM, Jack Levin <[email protected]> wrote: >>> Who said anything about deb :). I do use tarballs.... Yes, so what did >>> it is the copy of that jar to under hbase/lib, and then full restart. >>> Now here is a funny thing, the master shuddered for about 10 minutes, >>> spewing those messages: >>> >>> 2010-09-20 21:23:45,826 DEBUG org.apache.hadoop.hbase.master.HMaster: >>> Event NodeCreated with state SyncConnected with path >>> /hbase/UNASSIGNED/97999366 >>> 2010-09-20 21:23:45,827 DEBUG >>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event >>> NodeCreated with path /hbase/UNASSIGNED/97999366 >>> 2010-09-20 21:23:45,827 DEBUG >>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: >>> Got zkEvent NodeCreated state:SyncConnected >>> path:/hbase/UNASSIGNED/97999366 >>> 2010-09-20 21:23:45,827 DEBUG >>> org.apache.hadoop.hbase.master.RegionManager: Created/updated >>> UNASSIGNED zNode img15,normal052q.jpg,1285001686282.97999366 in state >>> M2ZK_REGION_OFFLINE >>> 2010-09-20 21:23:45,828 INFO >>> org.apache.hadoop.hbase.master.RegionServerOperation: >>> img13,p1000319tq.jpg,1284952655960.812544765 open on >>> 10.103.2.3,60020,1285042333293 >>> 2010-09-20 21:23:45,828 DEBUG >>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [ >>> M2ZK_REGION_OFFLINE ] for region 97999366 >>> 2010-09-20 21:23:45,828 DEBUG org.apache.hadoop.hbase.master.HMaster: >>> Event NodeChildrenChanged with state SyncConnected with path >>> /hbase/UNASSIGNED >>> 2010-09-20 21:23:45,828 DEBUG >>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event >>> NodeChildrenChanged with path /hbase/UNASSIGNED >>> 2010-09-20 21:23:45,828 DEBUG >>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: >>> Got zkEvent NodeChildrenChanged state:SyncConnected >>> path:/hbase/UNASSIGNED >>> 2010-09-20 21:23:45,830 DEBUG >>> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of >>> img150,,1284859678248.3116007 is not valid; >>> serverAddress=10.103.2.1:60020, startCode=1285038205920 unknown. >>> >>> >>> Does anyone know what they mean? At first it would kill one of my >>> datanodes. But what helped is when I changed to heap size to 4GB for >>> master and 2GB for datanode that was dying, and after 10 minutes I got >>> into a clean state. >>> >>> -Jack >>> >>> >>> On Mon, Sep 20, 2010 at 9:28 PM, Ryan Rawson <[email protected]> wrote: >>>> yes, on every single machine as well, and restart. >>>> >>>> again, not sure how how you'd do this in a scalable manner with your >>>> deb packages... on the source tarball you can just replace it, rsync >>>> it out and done. >>>> >>>> :-) >>>> >>>> On Mon, Sep 20, 2010 at 8:56 PM, Jack Levin <[email protected]> wrote: >>>>> ok, I found that file, do I replace hadoop-core.*.jar under >>>>> /usr/lib/hbase/lib? >>>>> Then restart, etc? All regionservers too? >>>>> >>>>> -Jack >>>>> >>>>> On Mon, Sep 20, 2010 at 8:40 PM, Ryan Rawson <[email protected]> wrote: >>>>>> Well I don't really run CDH, I disagree with their rpm/deb packaging >>>>>> policies and I have to highly recommend not using DEBs to install >>>>>> software... >>>>>> >>>>>> So normally installing from tarball, the jar is in >>>>>> <installpath>/hadoop-0.20.0-320/hadoop-core-0.20.2+320.jar >>>>>> >>>>>> On CDH/DEB edition, it's somewhere silly ... locate and find will be >>>>>> your friend. It should be called hadoop-core-0.20.2+320.jar though! >>>>>> >>>>>> I'm working on a github publish of SU's production system, which uses >>>>>> the cloudera maven repo to install the correct JAR in hbase so when >>>>>> you type 'mvn assembly:assembly' to build your own hbase-*-bin.tar.gz >>>>>> (the * being whatever version you specified in pom.xml) the cdh3b2 jar >>>>>> comes pre-packaged. >>>>>> >>>>>> Stay tuned :-) >>>>>> >>>>>> -ryan >>>>>> >>>>>> On Mon, Sep 20, 2010 at 8:36 PM, Jack Levin <[email protected]> wrote: >>>>>>> Ryan, hadoop jar, what is the usual path to the file? I just to to be >>>>>>> sure, and where do I put it? >>>>>>> >>>>>>> -Jack >>>>>>> >>>>>>> On Mon, Sep 20, 2010 at 8:30 PM, Ryan Rawson <[email protected]> wrote: >>>>>>>> you need 2 more things: >>>>>>>> >>>>>>>> - restart hdfs >>>>>>>> - make sure the hadoop jar from your install replaces the one we ship >>>>>>>> with >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 20, 2010 at 8:22 PM, Jack Levin <[email protected]> wrote: >>>>>>>>> So, I switched to 0.89, and we already had CDH3 >>>>>>>>> (hadoop-0.20-datanode-0.20.2+320-3.noarch), even though I added >>>>>>>>> <name>dfs.support.append</name> as true to both hdfs-site.xml and >>>>>>>>> hbase-site.xml, the master still reports this: >>>>>>>>> >>>>>>>>> You are currently running the HMaster without HDFS append support >>>>>>>>> enabled. This may result in data loss. Please see the HBase wiki for >>>>>>>>> details. >>>>>>>>> Master Attributes >>>>>>>>> Attribute Name Value Description >>>>>>>>> HBase Version 0.89.20100726, r979826 HBase version and svn revision >>>>>>>>> HBase Compiled Sat Jul 31 02:01:58 PDT 2010, stack When HBase >>>>>>>>> version >>>>>>>>> was compiled and by whom >>>>>>>>> Hadoop Version 0.20.2, r911707 Hadoop version and svn revision >>>>>>>>> Hadoop Compiled Fri Feb 19 08:07:34 UTC 2010, chrisdo When Hadoop >>>>>>>>> version was compiled and by whom >>>>>>>>> HBase Root Directory hdfs://namenode-rd.imageshack.us:9000/hbase >>>>>>>>> Location >>>>>>>>> of HBase home directory >>>>>>>>> >>>>>>>>> Any ideas whats wrong? >>>>>>>>> >>>>>>>>> -Jack >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Sep 20, 2010 at 5:47 PM, Ryan Rawson <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> Hey, >>>>>>>>>> >>>>>>>>>> There is actually only 1 active branch of hbase, that being the 0.89 >>>>>>>>>> release, which is based on 'trunk'. We have snapshotted a series of >>>>>>>>>> 0.89 "developer releases" in hopes that people would try them our and >>>>>>>>>> start thinking about the next major version. One of these is what SU >>>>>>>>>> is running prod on. >>>>>>>>>> >>>>>>>>>> At this point tracking 0.89 and which ones are the 'best' peach sets >>>>>>>>>> to run is a bit of a contact sport, but if you are serious about not >>>>>>>>>> losing data it is worthwhile. SU is based on the most recent DR with >>>>>>>>>> a few minor patches of our own concoction brought in. If current >>>>>>>>>> works, but some Master ops are slow, and there are a few patches on >>>>>>>>>> top of that. I'll poke about and see if its possible to publish to a >>>>>>>>>> github branch or something. >>>>>>>>>> >>>>>>>>>> -ryan >>>>>>>>>> >>>>>>>>>> On Mon, Sep 20, 2010 at 5:16 PM, Jack Levin <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> Sounds, good, only reason I ask is because of this: >>>>>>>>>>> >>>>>>>>>>> There are currently two active branches of HBase: >>>>>>>>>>> >>>>>>>>>>> * 0.20 - the current stable release series, being maintained with >>>>>>>>>>> patches for bug fixes only. This release series does not support >>>>>>>>>>> HDFS >>>>>>>>>>> durability - edits may be lost in the case of node failure. >>>>>>>>>>> * 0.89 - a development release series with active feature and >>>>>>>>>>> stability development, not currently recommended for production use. >>>>>>>>>>> This release does support HDFS durability - cases in which edits are >>>>>>>>>>> lost are considered serious bugs. >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Are we talking about data loss in case of datanode going down while >>>>>>>>>>> being written to, or RegionServer going down? >>>>>>>>>>> >>>>>>>>>>> -jack >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 20, 2010 at 4:09 PM, Ryan Rawson <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> We run 0.89 in production @ Stumbleupon. We also employ 3 >>>>>>>>>>>> committers... >>>>>>>>>>>> >>>>>>>>>>>> As for safety, you have no choice but to run 0.89. If you run a >>>>>>>>>>>> 0.20 >>>>>>>>>>>> release you will lose data. you must be on 0.89 and >>>>>>>>>>>> CDH3/append-branch to achieve data durability, and there really is >>>>>>>>>>>> no >>>>>>>>>>>> argument around it. If you are doing your tests with 0.20.6 now, >>>>>>>>>>>> I'd >>>>>>>>>>>> stop and rebase those tests onto the latest DR announced on the >>>>>>>>>>>> list. >>>>>>>>>>>> >>>>>>>>>>>> -ryan >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 20, 2010 at 3:17 PM, Jack Levin <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi Stack, see inline: >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 20, 2010 at 2:42 PM, Stack <[email protected]> wrote: >>>>>>>>>>>>>> Hey Jack: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for writing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> See below for some comments. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Sep 20, 2010 at 11:00 AM, Jack Levin <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Image-Shack gets close to two million image uploads per day, >>>>>>>>>>>>>>> which are >>>>>>>>>>>>>>> usually stored on regular servers (we have about 700), as >>>>>>>>>>>>>>> regular >>>>>>>>>>>>>>> files, and each server has its own host name, such as (img55). >>>>>>>>>>>>>>> I've >>>>>>>>>>>>>>> been researching on how to improve our backend design in terms >>>>>>>>>>>>>>> of data >>>>>>>>>>>>>>> safety and stumped onto the Hbase project. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any other requirements other than data safety? (latency, etc). >>>>>>>>>>>>> >>>>>>>>>>>>> Latency is the second requirement. We have some services that are >>>>>>>>>>>>> very short tail, and can produce 95% cache hit rate, so I assume >>>>>>>>>>>>> this >>>>>>>>>>>>> would really put cache into good use. Some other services >>>>>>>>>>>>> however, >>>>>>>>>>>>> have about 25% cache hit ratio, in which case the latency should >>>>>>>>>>>>> be >>>>>>>>>>>>> 'adequate', e.g. if its slightly worse than getting data off raw >>>>>>>>>>>>> disk, >>>>>>>>>>>>> then its good enough. Safely is supremely important, then its >>>>>>>>>>>>> availability, then speed. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> Now, I think hbase is he most beautiful thing that happen to >>>>>>>>>>>>>>> distributed DB world :). The idea is to store image files >>>>>>>>>>>>>>> (about >>>>>>>>>>>>>>> 400Kb on average into HBASE). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'd guess some images are much bigger than this. Do you ever >>>>>>>>>>>>>> limit >>>>>>>>>>>>>> the size of images folks can upload to your service? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> The setup will include the following >>>>>>>>>>>>>>> configuration: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 50 servers total (2 datacenters), with 8 GB RAM, dual core cpu, >>>>>>>>>>>>>>> 6 x >>>>>>>>>>>>>>> 2TB disks each. >>>>>>>>>>>>>>> 3 to 5 Zookeepers >>>>>>>>>>>>>>> 2 Masters (in a datacenter each) >>>>>>>>>>>>>>> 10 to 20 Stargate REST instances (one per server, hash >>>>>>>>>>>>>>> loadbalanced) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Whats your frontend? Why REST? It might be more efficient if >>>>>>>>>>>>>> you >>>>>>>>>>>>>> could run with thrift given REST base64s its payload IIRC (check >>>>>>>>>>>>>> the >>>>>>>>>>>>>> src yourself). >>>>>>>>>>>>> >>>>>>>>>>>>> For insertion we use Haproxy, and balance curl PUTs across >>>>>>>>>>>>> multiple REST APIs. >>>>>>>>>>>>> For reading, its a nginx proxy that does Content-type modification >>>>>>>>>>>>> from image/jpeg to octet-stream, and vice versa, >>>>>>>>>>>>> it then hits Haproxy again, which hits balanced REST. >>>>>>>>>>>>> Why REST, it was the simplest thing to run, given that its >>>>>>>>>>>>> supports >>>>>>>>>>>>> HTTP, potentially we could rewrite something for thrift, as long >>>>>>>>>>>>> as we >>>>>>>>>>>>> can use http still to send and receive data (anyone wrote anything >>>>>>>>>>>>> like that say in python, C or java?) >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 40 to 50 RegionServers (will probably keep masters separate on >>>>>>>>>>>>>>> dedicated boxes). >>>>>>>>>>>>>>> 2 Namenode servers (one backup, highly available, will do >>>>>>>>>>>>>>> fsimage and >>>>>>>>>>>>>>> edits snapshots also) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So far I got about 13 servers running, and doing about 20 >>>>>>>>>>>>>>> insertions / >>>>>>>>>>>>>>> second (file size ranging from few KB to 2-3MB, ave. 400KB). via >>>>>>>>>>>>>>> Stargate API. Our frontend servers receive files, and I just >>>>>>>>>>>>>>> fork-insert them into stargate via http (curl). >>>>>>>>>>>>>>> The inserts are humming along nicely, without any noticeable >>>>>>>>>>>>>>> load on >>>>>>>>>>>>>>> regionservers, so far inserted about 2 TB worth of images. >>>>>>>>>>>>>>> I have adjusted the region file size to be 512MB, and table >>>>>>>>>>>>>>> block size >>>>>>>>>>>>>>> to about 400KB , trying to match average access block to limit >>>>>>>>>>>>>>> HDFS >>>>>>>>>>>>>>> trips. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As Todd suggests, I'd go up from 512MB... 1G at least. You'll >>>>>>>>>>>>>> probably want to up your flush size from 64MB to 128MB or maybe >>>>>>>>>>>>>> 192MB. >>>>>>>>>>>>> >>>>>>>>>>>>> Yep, i will adjust to 1G. I thought flush was controlled by a >>>>>>>>>>>>> function of memstore HEAP, something like 40%? Or are you talking >>>>>>>>>>>>> about HDFS block size? >>>>>>>>>>>>> >>>>>>>>>>>>>> So far the read performance was more than adequate, and of >>>>>>>>>>>>>>> course write performance is nowhere near capacity. >>>>>>>>>>>>>>> So right now, all newly uploaded images go to HBASE. But we do >>>>>>>>>>>>>>> plan >>>>>>>>>>>>>>> to insert about 170 Million images (about 100 days worth), >>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>> only about 64 TB, or 10% of planned cluster size of 600TB. >>>>>>>>>>>>>>> The end goal is to have a storage system that creates data >>>>>>>>>>>>>>> safety, >>>>>>>>>>>>>>> e.g. system may go down but data can not be lost. Our >>>>>>>>>>>>>>> Front-End >>>>>>>>>>>>>>> servers will continue to serve images from their own file >>>>>>>>>>>>>>> system (we >>>>>>>>>>>>>>> are serving about 16 Gbits at peak), however should we need to >>>>>>>>>>>>>>> bring >>>>>>>>>>>>>>> any of those down for maintenance, we will redirect all traffic >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> Hbase (should be no more than few hundred Mbps), while the >>>>>>>>>>>>>>> front end >>>>>>>>>>>>>>> server is repaired (for example having its disk replaced), >>>>>>>>>>>>>>> after the >>>>>>>>>>>>>>> repairs, we quickly repopulate it with missing files, while >>>>>>>>>>>>>>> serving >>>>>>>>>>>>>>> the missing remaining off Hbase. >>>>>>>>>>>>>>> All in all should be very interesting project, and I am hoping >>>>>>>>>>>>>>> not to >>>>>>>>>>>>>>> run into any snags, however, should that happens, I am pleased >>>>>>>>>>>>>>> to know >>>>>>>>>>>>>>> that such a great and vibrant tech group exists that supports >>>>>>>>>>>>>>> and uses >>>>>>>>>>>>>>> HBASE :). >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> We're definetly interested in how your project progresses. If >>>>>>>>>>>>>> you are >>>>>>>>>>>>>> ever up in the city, you should drop by for a chat. >>>>>>>>>>>>> >>>>>>>>>>>>> Cool. I'd like that. >>>>>>>>>>>>> >>>>>>>>>>>>>> St.Ack >>>>>>>>>>>>>> >>>>>>>>>>>>>> P.S. I'm also w/ Todd that you should move to 0.89 and blooms. >>>>>>>>>>>>>> P.P.S I updated the wiki on stargate REST: >>>>>>>>>>>>>> http://wiki.apache.org/hadoop/Hbase/Stargate >>>>>>>>>>>>> >>>>>>>>>>>>> Cool, I assume if we move to that it won't kill existing meta >>>>>>>>>>>>> tables, >>>>>>>>>>>>> and data? e.g. cross compatible? >>>>>>>>>>>>> Is 0.89 ready for production environment? >>>>>>>>>>>>> >>>>>>>>>>>>> -Jack >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
