See in the below.

On Tue, Apr 12, 2011 at 4:38 AM, Michael G. Noll
<[email protected]> wrote:
> So in order to help myself and hopefully also other readers of this
> mailing list, I try to summarize my steps so far to understand and build
> Hadoop 0.20-append for use with HBase 0.90.2, the problems I have run
> into, and I'll also list the pending issues and roadblocks that I haven't
> solved yet.
>

Thanks for putting together this list.

> - FWIW, I compared the Hadoop JAR file shipped with HBase 0.90.1/0.90.2
>  (hadoop-core-0.20-append-r1056497.jar) with the one I built from the
>  latest version of branch-0.20-append.  I noticed that the JAR file in
>  HBase seems to miss the latest commit for HDFS-1554 (SVN rev 1057313
>  aka git commit df0d79cc). In git terms, the Hadoop JAR file shipped
>  in HBase is based on HEAD^1 of branch-0.20-append.  Is there a reason
>  for not including the latest commit?

That is right.

The last commit went in after we'd released 0.90.  We could not pull
it into hbase because the last change on the tip of the hadoop branch
-- hdfs-1554 -- changed the RPC version.  If we'd pulled it in, folks
upgrading from 0.90.0 to 0.90.1 would have been surprised when their
HBase could not connect to their hadoop cluster (I actually committed
the new hadoop jar and was convinced we should back it out; see
HBASE-3520).

That said, we've found that these last few commits on the
branch-0.20-append by Hairong are pretty critical.  They provide a
short-circuit to the Master to allow it grab the lease on WAL files so
it can split them on regionserver crash ("New semantics for
recoverLease"); we've found that the master on occasion can fail
assuming the WAL file doing the open-for-append, what we did before
"New semantics..".

HBase 0.90.2 can make use of this new API.

So, HBase 0.90.2 and the tip of branch-0.20-append is recommended.

(CDH betas did not have HDFS-1554 either.  The release does and the
included HBase in CDH makes use of the new semantics around lease
recovery).

> - I also discovered (like Mike Spreitzner did [3]) that there is a
>  BlockChannel.class file in HBase's Hadoop JAR file that seems to come
>  "out of nowhere".  I haven't found it or a reference to it anywhere in
>  the source code.  I decompiled the class [4], and it appears to be an
>  innocent file, maybe used for debugging. A build artifact?
>

Whoops.  Thanks for spotting that.  Build artifact I'd say.  I was
probably trying a patch and didn't clean up properly.

> Then I tried two different builds:
>
> 1) A first build to replicate and test the Hadoop JAR shipped with HBase
>   0.90.{1,2}, using all commit history up to SVN rev 1056491 aka git
>   e499be8.  The last commit being "HDFS-1555 ..." from 07-Jan-11.
>   In git terms, this is a build based on HEAD^1.
> 2) A second build to create the current version of the Hadoop append
>   branch, using all commit history up to SVN rev 1057313 aka git
>   df0d79cc.  The last commit is "HDFS-1554 ..." from 10-Jan-11.
>   In git terms, this is a build based on HEAD, i.e. the latest version
>   of branch-0.20-append.
>
> Here are my findings:
>
> 1) When I run "ant test" for the append branch version apparently used by
>   HBase 0.90.{1,2}, I consistently run into a build error in
>   TestFileAppend4, logged to
>   build/test/TEST-org.apache.hadoop.hdfs.TestFileAppend4.txt.
>   Details are available at [10].

Yes.  I've since noticed this.  I started to dig in a while back but
got distracted.  I think the test started failing with this commit:

commit 62441fbd516ec9132619d448a1051554d29d2dba
Author: Dhruba Borthakur <[email protected]>
Date:   Thu Jun 17 01:52:50 2010 +0000

    HDFS-1210. DFSClient should log exception when block recovery fails.
    (Todd Lipcon via dhruba)



> 2) When I run "ant test" for the latest version of the append branch, I
>   get the same error as before. However, I sometimes -- not always -- get
>   additional failures/errors for
>    * TEST-org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.txt [11]
>    * TEST-org.apache.hadoop.hdfs.TestMultiThreadedSync.txt [12]
>   both of which look like "general" errors to me.  Maybe a problem of
>   the machine I'm running the build and the tests on?
>

This I have not noticed.


> This leads me to two questions:
>
> 1. Are the test errors described above a known issue that can be ignored?
>   Or did I miss something when building the append branch?
>   From what I have read, my build process should have produced an Hadoop
>   JAR file that is equivalent to the one shipped with HBase.  So any
>   error during my tests should have surfaced for the HBase build, too.
>

See above.


> 2. Is there a way to test whether my custom build is "correct"?  In other
>   words, how can I find out whether the append/syncing works properly
>   so that it does not come to a data loss in HBase at some point.
>   Unfortunately, I haven't found any instructions to intentionally
>   create such a data-loss scenario for verifying whether Hadoop/HBase
>   handles it properly.  St.Ack, for instance, only talks about some
>   basic tests he did himself [13].

Yes.

There are hbase unit tests that will check for lost data.  These
passed before we cut the release.

Its probably little consolation to you but we've been running 0.90.1,
a 0.90.1 that had HBASE-3285 applied (and a CDH3b2 with 1554 et al.
applied) with a good while in production here where I work on multiple
clusters.


>   I know someone already asked this question before without receiving
>   a good answer but hey -- there's always hope. :-)
>
>
> Any feedback or pointers would be greatly appreciated!
>
> I'm happy to experiment and to report back.  Since St.Ack's suggestion
> to make a quick, official "append-ready" release of Hadoop for HBase [6]
> was not pursued (I do not want to restart a discussion here), at least I
> would like to help the community with a set of easy-to-follow instructions
> for other people to get HBase and Hadoop 0.20.x up and running.
>

You are a good man Michael.  Sounds like I need to update our Manual
at least to include the info above.

Thanks for doing the digging and taking the time to craft the note above,
St.Ack


> Best,
> Michael
>
>
> PS: And congratulations for getting 0.90.2 out. Your work is really
> appreciated! :-)
>
>
> [1] http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/
> [2] http://hbase.apache.org/book/notsoquick.html#hadoop
> [3] http://search-hadoop.com/m/mfUkf2EEiaf
> [4] http://pastebin.ubuntu.com/587699/
> [5] http://wiki.apache.org/hadoop/GitAndHadoop
> [6] http://www.mail-archive.com/[email protected]/msg02543.html
> [7] http://www.mail-archive.com/[email protected]/msg06772.html
> [8] http://www.mail-archive.com/[email protected]/msg07060.html
> [9] http://www.mail-archive.com/[email protected]/msg02785.html
> [10] http://pastebin.ubuntu.com/593073/
> [11] http://pastebin.ubuntu.com/593075/
> [12] http://pastebin.ubuntu.com/593076/
> [13] http://www.mail-archive.com/[email protected]/msg07158.html
>

Reply via email to