Re: HBase stability

陈加俊 Tue, 14 Dec 2010 05:56:57 -0800

 Where should i download the branch-0.20-append?  I can't get the compiled
jar from url as follow：
http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append .




On Tue, Dec 14, 2010 at 1:44 AM, Stack <[email protected]> wrote:

> Some comments inline in the below.
>
> On Mon, Dec 13, 2010 at 8:45 AM, baggio liu <[email protected]> wrote:
> > Hi  Anze,
> >   Our production cluster used HBase 0.20.6 and hdfs (CDH3b2), and we work
> > for stability about a month. Some issue we have been met, and may helpful
> to
> > you.
> >
>
> Thanks for writing back to the list with your experiences.
>
> > HDFS:
> >    1.  hbase file has short life cycle than map-red, some times there're
> > many blocks should be delete, we should tuning for the speed of hdfs
> invalid
> > block.
>
>
> This can be true.  Yes.  What are you suggesting here?  What should we
> tune?
>
>
> >    2. hadoop 0.20 branch can not deal with disk failure, HDFS-630 will be
> > helpful.
>
>
> hdfs-630 has been applied to the branch-0.20-append branch (Its also
> in CDH IIRC).
>
>
> >    3. region server can not deal IOException rightly. When DFSClient meet
> > network error, it'll throw IOException, and it may be not fatal for
> region
> > server, so these IOException MUST be review.
>
>
> Usually if RegionServer has issues getting to HDFS, it'll shut itself
> down.  This is 'normal' perhaps overly-defensive behavior.  The story
> should be better in 0.90 but would be interested in any list you might
> have where you think we should be able to catch and continue.
>
>
> >    4. In large scale scan, there're many concurrent reader in a short
> time.
>
>
> Just FYI, HBase opens all files and keeps them open on startup.
> There'll be pressure on file handles, threads in data nodes, as soon
> as you start up an HBase instance.  Scans use the already opened files
> so whether 1 or N ongoing Scans, the pressure on HDFS is the same.
>
> > We must make datanode dataxceiver number to a large number, and file
> handle
> > limit should be tuning. In addition, the connection reuse between
> DFSClient
> > and datanode should be done.
> >
>
> Yes.  This is in our requirements for HBase.  Here is the latest from
> the 0.90.0RC HBase 'book':
>
> http://people.apache.org/~stack/hbase-0.90.0-candidate-1/docs/notsoquick.html#ulimit<http://people.apache.org/%7Estack/hbase-0.90.0-candidate-1/docs/notsoquick.html#ulimit>
>
> What do you mean by connection reuse?
>
>
> > HBase
> >    1. single thread compaction limit the speed of compaction, it should
> be
> > made multi-thread.( during multi-thread compaction we should limit
> network
> > bandwidth in compaction )
>
> True but also in 0.90 compaction algorithm is smarter; there is less to do.
>
>
> >    2. single thread split HLog (read HLog) wile make Hbase down time
> > longer, make it multi-thread can limit HBase down time.
>
>
> True in 0.20 but in 0.90, splits are much faster; splits come up
> immediately on the regionserver that hosted the parent that split
> rather than go back to the master for the master to assign out the new
> daughter regions.
>
> >    3.  Additional, some tools should be done such as meta region checker,
> > fixer and so on.
>
>
> Yes.  In 0.90, we have hbck tool to run checks and report on
> inconsistencies.
>
>
> >    4.  zookeeper session timeout should be tuning according to your load
> on
> > HBase cluster.
>
> Yes.  ZooKeeper ping is the regionservers lifeline to the cluster.  If
> it goes amiss, then regionserver is considered lost and master will
> take restorative action.
>
>
> >    5.  gc stratigy should be tuning on your region server/HMaster.
> >
>
>
> Yes.  Any suggestions from your experience?
>
>
> >    Beside upon,  in production cluster, data loss issue should be fix  as
> > while.(currently hadoop 0.20 append branch and CDH3b2 hadoop can be
> used.)
>
>
> Yes.  Here is the 0.90 doc. on hadoop versions:
>
> http://people.apache.org/~stack/hbase-0.90.0-candidate-1/docs/notsoquick.html#hadoop<http://people.apache.org/%7Estack/hbase-0.90.0-candidate-1/docs/notsoquick.html#hadoop>
>
>
> >    Because of hdfs make many optimization on throughput, for application
> > like HBase (many random read/write) . Many tuning and change on hdfs
> should
> > be done.
>
> Do you have suggestions?  A list?
>
> Thanks for writing the list Baggio,
> St.Ack
>
>
> >    Hope this experience can be helpful to you.
> >
> >
> > Thanks & Best regard
> > Baggio
> >
> >
> > 2010/12/14 Todd Lipcon <[email protected]>
> >
> >> HI Anze,
> >>
> >> In word, yes - 0.20.4 is not that stable in my experience, and
> >> upgrading to the latest CDH3 beta (which includes HBase 0.89.20100924)
> >> should give you a huge improvement in stability.
> >>
> >> You'll still need to do a bit of tuning of settings, but once it's
> >> well tuned it should be able to hold up under load without crashing.
> >>
> >> -Todd
> >>
> >> On Mon, Dec 13, 2010 at 2:41 AM, Anze <[email protected]> wrote:
> >> > Hi all!
> >> >
> >> > We have been using HBase 0.20.4 (cdh3b1) in production on 2 nodes for
> a
> >> few
> >> > months now and we are having constant issues with it. We fell over all
> >> > standard traps (like "Too many open files", network configuration
> >> > problems,...). All in all, we had about one crash every week or so.
> >> > Fortunately we are still using it just for background processing so
> our
> >> > service didn't suffer directly, but we have lost huge amounts of time
> >> just
> >> > fixing the data errors that resulted from data not being written to
> >> permanent
> >> > storage. Not to mention fixing the issues.
> >> > As you can probably understand, we are very frustrated with this and
> are
> >> > seriously considering moving to another bigtable.
> >> >
> >> > Right now, HBase crashes whenever we run very intensive rebuild of
> >> secondary
> >> > index (normal table, but we use it as secondary index) to a huge
> table. I
> >> have
> >> > found this:
> >> > http://wiki.apache.org/hadoop/Hbase/Troubleshooting
> >> > (see problem 9)
> >> > One of the lines read:
> >> > "Make sure you give plenty of RAM (in hbase-env.sh), the default of
> 1GB
> >> won't
> >> > be able to sustain long running imports."
> >> >
> >> > So, if I understand correctly, no matter how HBase is set up, if I run
> an
> >> > intensive enough application, it will choke? I would expect it to be
> >> slower
> >> > when under (too much) pressure, but not to crash.
> >> >
> >> > Of course, we will somehow solve this issue (working on it), but... :(
> >> >
> >> > What are your experiences with HBase? Is it stable? Is it just us and
> the
> >> way
> >> > we set it up?
> >> >
> >> > Also, would upgrading to 0.89 (cdh3b3) help?
> >> >
> >> > Thanks,
> >> >
> >> > Anze
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
>



-- 
best wishes
jiajun

Re: HBase stability

Reply via email to