Re: Millions of photos into Hbase

Ryan Rawson Mon, 20 Sep 2010 20:41:26 -0700

Well I don't really run CDH, I disagree with their rpm/deb packaging
policies and I have to highly recommend not using DEBs to install
software...


So normally installing from tarball, the jar is in
<installpath>/hadoop-0.20.0-320/hadoop-core-0.20.2+320.jar

On CDH/DEB edition, it's somewhere silly ... locate and find will be
your friend.  It should be called hadoop-core-0.20.2+320.jar though!

I'm working on a github publish of SU's production system, which uses
the cloudera maven repo to install the correct JAR in hbase so when
you type 'mvn assembly:assembly' to build your own hbase-*-bin.tar.gz
(the * being whatever version you specified in pom.xml) the cdh3b2 jar
comes pre-packaged.

Stay tuned :-)

-ryan

On Mon, Sep 20, 2010 at 8:36 PM, Jack Levin <[email protected]> wrote:
> Ryan, hadoop jar, what is the usual path to the file? I just to to be
> sure, and where do I put it?
>
> -Jack
>
> On Mon, Sep 20, 2010 at 8:30 PM, Ryan Rawson <[email protected]> wrote:
>> you need 2 more things:
>>
>> - restart hdfs
>> - make sure the hadoop jar from your install replaces the one we ship with
>>
>>
>> On Mon, Sep 20, 2010 at 8:22 PM, Jack Levin <[email protected]> wrote:
>>> So, I switched to 0.89, and we already had CDH3
>>> (hadoop-0.20-datanode-0.20.2+320-3.noarch), even though I added
>>>  <name>dfs.support.append</name> as true to both hdfs-site.xml and
>>> hbase-site.xml, the master still reports this:
>>>
>>>  You are currently running the HMaster without HDFS append support
>>> enabled. This may result in data loss. Please see the HBase wiki  for
>>> details.
>>> Master Attributes
>>> Attribute Name  Value   Description
>>> HBase Version   0.89.20100726, r979826  HBase version and svn revision
>>> HBase Compiled  Sat Jul 31 02:01:58 PDT 2010, stack     When HBase version
>>> was compiled and by whom
>>> Hadoop Version  0.20.2, r911707 Hadoop version and svn revision
>>> Hadoop Compiled Fri Feb 19 08:07:34 UTC 2010, chrisdo   When Hadoop
>>> version was compiled and by whom
>>> HBase Root Directory    hdfs://namenode-rd.imageshack.us:9000/hbase     
>>> Location
>>> of HBase home directory
>>>
>>> Any ideas whats wrong?
>>>
>>> -Jack
>>>
>>>
>>> On Mon, Sep 20, 2010 at 5:47 PM, Ryan Rawson <[email protected]> wrote:
>>>> Hey,
>>>>
>>>> There is actually only 1 active branch of hbase, that being the 0.89
>>>> release, which is based on 'trunk'.  We have snapshotted a series of
>>>> 0.89 "developer releases" in hopes that people would try them our and
>>>> start thinking about the next major version.  One of these is what SU
>>>> is running prod on.
>>>>
>>>> At this point tracking 0.89 and which ones are the 'best' peach sets
>>>> to run is a bit of a contact sport, but if you are serious about not
>>>> losing data it is worthwhile.  SU is based on the most recent DR with
>>>> a few minor patches of our own concoction brought in.  If current
>>>> works, but some Master ops are slow, and there are a few patches on
>>>> top of that.  I'll poke about and see if its possible to publish to a
>>>> github branch or something.
>>>>
>>>> -ryan
>>>>
>>>> On Mon, Sep 20, 2010 at 5:16 PM, Jack Levin <[email protected]> wrote:
>>>>> Sounds, good, only reason I ask is because of this:
>>>>>
>>>>> There are currently two active branches of HBase:
>>>>>
>>>>>    * 0.20 - the current stable release series, being maintained with
>>>>> patches for bug fixes only. This release series does not support HDFS
>>>>> durability - edits may be lost in the case of node failure.
>>>>>    * 0.89 - a development release series with active feature and
>>>>> stability development, not currently recommended for production use.
>>>>> This release does support HDFS durability - cases in which edits are
>>>>> lost are considered serious bugs.
>>>>>>>>>>>
>>>>>
>>>>> Are we talking about data loss in case of datanode going down while
>>>>> being written to, or RegionServer going down?
>>>>>
>>>>> -jack
>>>>>
>>>>>
>>>>> On Mon, Sep 20, 2010 at 4:09 PM, Ryan Rawson <[email protected]> wrote:
>>>>>> We run 0.89 in production @ Stumbleupon.  We also employ 3 committers...
>>>>>>
>>>>>> As for safety, you have no choice but to run 0.89.  If you run a 0.20
>>>>>> release you will lose data.  you must be on 0.89 and
>>>>>> CDH3/append-branch to achieve data durability, and there really is no
>>>>>> argument around it.  If you are doing your tests with 0.20.6 now, I'd
>>>>>> stop and rebase those tests onto the latest DR announced on the list.
>>>>>>
>>>>>> -ryan
>>>>>>
>>>>>> On Mon, Sep 20, 2010 at 3:17 PM, Jack Levin <[email protected]> wrote:
>>>>>>> Hi Stack, see inline:
>>>>>>>
>>>>>>> On Mon, Sep 20, 2010 at 2:42 PM, Stack <[email protected]> wrote:
>>>>>>>> Hey Jack:
>>>>>>>>
>>>>>>>> Thanks for writing.
>>>>>>>>
>>>>>>>> See below for some comments.
>>>>>>>>
>>>>>>>> On Mon, Sep 20, 2010 at 11:00 AM, Jack Levin <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Image-Shack gets close to two million image uploads per day, which are
>>>>>>>>> usually stored on regular servers (we have about 700), as regular
>>>>>>>>> files, and each server has its own host name, such as (img55).   I've
>>>>>>>>> been researching on how to improve our backend design in terms of data
>>>>>>>>> safety and stumped onto the Hbase project.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Any other requirements other than data safety? (latency, etc).
>>>>>>>
>>>>>>> Latency is the second requirement.  We have some services that are
>>>>>>> very short tail, and can produce 95% cache hit rate, so I assume this
>>>>>>> would really put cache into good use.  Some other services however,
>>>>>>> have about 25% cache hit ratio, in which case the latency should be
>>>>>>> 'adequate', e.g. if its slightly worse than getting data off raw disk,
>>>>>>> then its good enough.   Safely is supremely important, then its
>>>>>>> availability, then speed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> Now, I think hbase is he most beautiful thing that happen to
>>>>>>>>> distributed DB world :).   The idea is to store image files (about
>>>>>>>>> 400Kb on average into HBASE).
>>>>>>>>
>>>>>>>>
>>>>>>>> I'd guess some images are much bigger than this.  Do you ever limit
>>>>>>>> the size of images folks can upload to your service?
>>>>>>>>
>>>>>>>>
>>>>>>>> The setup will include the following
>>>>>>>>> configuration:
>>>>>>>>>
>>>>>>>>> 50 servers total (2 datacenters), with 8 GB RAM, dual core cpu, 6 x
>>>>>>>>> 2TB disks each.
>>>>>>>>> 3 to 5 Zookeepers
>>>>>>>>> 2 Masters (in a datacenter each)
>>>>>>>>> 10 to 20 Stargate REST instances (one per server, hash loadbalanced)
>>>>>>>>
>>>>>>>> Whats your frontend?  Why REST?  It might be more efficient if you
>>>>>>>> could run with thrift given REST base64s its payload IIRC (check the
>>>>>>>> src yourself).
>>>>>>>
>>>>>>> For insertion we use Haproxy, and balance curl PUTs across multiple 
>>>>>>> REST APIs.
>>>>>>> For reading, its a nginx proxy that does Content-type modification
>>>>>>> from image/jpeg to octet-stream, and vice versa,
>>>>>>> it then hits Haproxy again, which hits balanced REST.
>>>>>>> Why REST, it was the simplest thing to run, given that its supports
>>>>>>> HTTP, potentially we could rewrite something for thrift, as long as we
>>>>>>> can use http still to send and receive data (anyone wrote anything
>>>>>>> like that say in python, C or java?)
>>>>>>>
>>>>>>>>
>>>>>>>>> 40 to 50 RegionServers (will probably keep masters separate on 
>>>>>>>>> dedicated boxes).
>>>>>>>>> 2 Namenode servers (one backup, highly available, will do fsimage and
>>>>>>>>> edits snapshots also)
>>>>>>>>>
>>>>>>>>> So far I got about 13 servers running, and doing about 20 insertions /
>>>>>>>>> second (file size ranging from few KB to 2-3MB, ave. 400KB). via
>>>>>>>>> Stargate API.  Our frontend servers receive files, and I just
>>>>>>>>> fork-insert them into stargate via http (curl).
>>>>>>>>> The inserts are humming along nicely, without any noticeable load on
>>>>>>>>> regionservers, so far inserted about 2 TB worth of images.
>>>>>>>>> I have adjusted the region file size to be 512MB, and table block size
>>>>>>>>> to about 400KB , trying to match average access block to limit HDFS
>>>>>>>>> trips.
>>>>>>>>
>>>>>>>> As Todd suggests, I'd go up from 512MB... 1G at least.  You'll
>>>>>>>> probably want to up your flush size from 64MB to 128MB or maybe 192MB.
>>>>>>>
>>>>>>> Yep, i will adjust to 1G.  I thought flush was controlled by a
>>>>>>> function of memstore HEAP, something like 40%?  Or are you talking
>>>>>>> about HDFS block size?
>>>>>>>
>>>>>>>>  So far the read performance was more than adequate, and of
>>>>>>>>> course write performance is nowhere near capacity.
>>>>>>>>> So right now, all newly uploaded images go to HBASE.  But we do plan
>>>>>>>>> to insert about 170 Million images (about 100 days worth), which is
>>>>>>>>> only about 64 TB, or 10% of planned cluster size of 600TB.
>>>>>>>>> The end goal is to have a storage system that creates data safety,
>>>>>>>>> e.g. system may go down but data can not be lost.   Our Front-End
>>>>>>>>> servers will continue to serve images from their own file system (we
>>>>>>>>> are serving about 16 Gbits at peak), however should we need to bring
>>>>>>>>> any of those down for maintenance, we will redirect all traffic to
>>>>>>>>> Hbase (should be no more than few hundred Mbps), while the front end
>>>>>>>>> server is repaired (for example having its disk replaced), after the
>>>>>>>>> repairs, we quickly repopulate it with missing files, while serving
>>>>>>>>> the missing remaining off Hbase.
>>>>>>>>> All in all should be very interesting project, and I am hoping not to
>>>>>>>>> run into any snags, however, should that happens, I am pleased to know
>>>>>>>>> that such a great and vibrant tech group exists that supports and uses
>>>>>>>>> HBASE :).
>>>>>>>>>
>>>>>>>>
>>>>>>>> We're definetly interested in how your project progresses.  If you are
>>>>>>>> ever up in the city, you should drop by for a chat.
>>>>>>>
>>>>>>> Cool.  I'd like that.
>>>>>>>
>>>>>>>> St.Ack
>>>>>>>>
>>>>>>>> P.S. I'm also w/ Todd that you should move to 0.89 and blooms.
>>>>>>>> P.P.S I updated the wiki on stargate REST:
>>>>>>>> http://wiki.apache.org/hadoop/Hbase/Stargate
>>>>>>>
>>>>>>> Cool, I assume if we move to that it won't kill existing meta tables,
>>>>>>> and data?  e.g. cross compatible?
>>>>>>> Is 0.89 ready for production environment?
>>>>>>>
>>>>>>> -Jack
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Millions of photos into Hbase

Reply via email to