Where should i download the branch-0.20-append? I can't get the compiled jar from url as follow: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append .
On Tue, Dec 14, 2010 at 1:44 AM, Stack <[email protected]> wrote: > Some comments inline in the below. > > On Mon, Dec 13, 2010 at 8:45 AM, baggio liu <[email protected]> wrote: > > Hi Anze, > > Our production cluster used HBase 0.20.6 and hdfs (CDH3b2), and we work > > for stability about a month. Some issue we have been met, and may helpful > to > > you. > > > > Thanks for writing back to the list with your experiences. > > > HDFS: > > 1. hbase file has short life cycle than map-red, some times there're > > many blocks should be delete, we should tuning for the speed of hdfs > invalid > > block. > > > This can be true. Yes. What are you suggesting here? What should we > tune? > > > > 2. hadoop 0.20 branch can not deal with disk failure, HDFS-630 will be > > helpful. > > > hdfs-630 has been applied to the branch-0.20-append branch (Its also > in CDH IIRC). > > > > 3. region server can not deal IOException rightly. When DFSClient meet > > network error, it'll throw IOException, and it may be not fatal for > region > > server, so these IOException MUST be review. > > > Usually if RegionServer has issues getting to HDFS, it'll shut itself > down. This is 'normal' perhaps overly-defensive behavior. The story > should be better in 0.90 but would be interested in any list you might > have where you think we should be able to catch and continue. > > > > 4. In large scale scan, there're many concurrent reader in a short > time. > > > Just FYI, HBase opens all files and keeps them open on startup. > There'll be pressure on file handles, threads in data nodes, as soon > as you start up an HBase instance. Scans use the already opened files > so whether 1 or N ongoing Scans, the pressure on HDFS is the same. > > > We must make datanode dataxceiver number to a large number, and file > handle > > limit should be tuning. In addition, the connection reuse between > DFSClient > > and datanode should be done. > > > > Yes. This is in our requirements for HBase. Here is the latest from > the 0.90.0RC HBase 'book': > > http://people.apache.org/~stack/hbase-0.90.0-candidate-1/docs/notsoquick.html#ulimit<http://people.apache.org/%7Estack/hbase-0.90.0-candidate-1/docs/notsoquick.html#ulimit> > > What do you mean by connection reuse? > > > > HBase > > 1. single thread compaction limit the speed of compaction, it should > be > > made multi-thread.( during multi-thread compaction we should limit > network > > bandwidth in compaction ) > > True but also in 0.90 compaction algorithm is smarter; there is less to do. > > > > 2. single thread split HLog (read HLog) wile make Hbase down time > > longer, make it multi-thread can limit HBase down time. > > > True in 0.20 but in 0.90, splits are much faster; splits come up > immediately on the regionserver that hosted the parent that split > rather than go back to the master for the master to assign out the new > daughter regions. > > > 3. Additional, some tools should be done such as meta region checker, > > fixer and so on. > > > Yes. In 0.90, we have hbck tool to run checks and report on > inconsistencies. > > > > 4. zookeeper session timeout should be tuning according to your load > on > > HBase cluster. > > Yes. ZooKeeper ping is the regionservers lifeline to the cluster. If > it goes amiss, then regionserver is considered lost and master will > take restorative action. > > > > 5. gc stratigy should be tuning on your region server/HMaster. > > > > > Yes. Any suggestions from your experience? > > > > Beside upon, in production cluster, data loss issue should be fix as > > while.(currently hadoop 0.20 append branch and CDH3b2 hadoop can be > used.) > > > Yes. Here is the 0.90 doc. on hadoop versions: > > http://people.apache.org/~stack/hbase-0.90.0-candidate-1/docs/notsoquick.html#hadoop<http://people.apache.org/%7Estack/hbase-0.90.0-candidate-1/docs/notsoquick.html#hadoop> > > > > Because of hdfs make many optimization on throughput, for application > > like HBase (many random read/write) . Many tuning and change on hdfs > should > > be done. > > Do you have suggestions? A list? > > Thanks for writing the list Baggio, > St.Ack > > > > Hope this experience can be helpful to you. > > > > > > Thanks & Best regard > > Baggio > > > > > > 2010/12/14 Todd Lipcon <[email protected]> > > > >> HI Anze, > >> > >> In word, yes - 0.20.4 is not that stable in my experience, and > >> upgrading to the latest CDH3 beta (which includes HBase 0.89.20100924) > >> should give you a huge improvement in stability. > >> > >> You'll still need to do a bit of tuning of settings, but once it's > >> well tuned it should be able to hold up under load without crashing. > >> > >> -Todd > >> > >> On Mon, Dec 13, 2010 at 2:41 AM, Anze <[email protected]> wrote: > >> > Hi all! > >> > > >> > We have been using HBase 0.20.4 (cdh3b1) in production on 2 nodes for > a > >> few > >> > months now and we are having constant issues with it. We fell over all > >> > standard traps (like "Too many open files", network configuration > >> > problems,...). All in all, we had about one crash every week or so. > >> > Fortunately we are still using it just for background processing so > our > >> > service didn't suffer directly, but we have lost huge amounts of time > >> just > >> > fixing the data errors that resulted from data not being written to > >> permanent > >> > storage. Not to mention fixing the issues. > >> > As you can probably understand, we are very frustrated with this and > are > >> > seriously considering moving to another bigtable. > >> > > >> > Right now, HBase crashes whenever we run very intensive rebuild of > >> secondary > >> > index (normal table, but we use it as secondary index) to a huge > table. I > >> have > >> > found this: > >> > http://wiki.apache.org/hadoop/Hbase/Troubleshooting > >> > (see problem 9) > >> > One of the lines read: > >> > "Make sure you give plenty of RAM (in hbase-env.sh), the default of > 1GB > >> won't > >> > be able to sustain long running imports." > >> > > >> > So, if I understand correctly, no matter how HBase is set up, if I run > an > >> > intensive enough application, it will choke? I would expect it to be > >> slower > >> > when under (too much) pressure, but not to crash. > >> > > >> > Of course, we will somehow solve this issue (working on it), but... :( > >> > > >> > What are your experiences with HBase? Is it stable? Is it just us and > the > >> way > >> > we set it up? > >> > > >> > Also, would upgrading to 0.89 (cdh3b3) help? > >> > > >> > Thanks, > >> > > >> > Anze > >> > > >> > > >> > >> > >> > >> -- > >> Todd Lipcon > >> Software Engineer, Cloudera > >> > > > -- best wishes jiajun
