Let me answer some parts w/ my best effort and let the others to add the things I'm missing.
Luca Toscano <toscano.l...@gmail.com> 於 2020年1月15日 週三 下午9:25寫道: > Hi everybody, > > I am part of the Analytics team of the Wikimedia Foundation. We are > currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch > hosts), and for various reasons we'd love to explore the possibility > of moving to BigTop :) > > The long term plan that we have in mind is something like the following: > > - Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x > release) > - Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to > 3.x) 1.5 is BOM is not finalized yet. It can be Hadoop 2.X or Hadoop 3, depending on whether Hadoop 3 packaging is solved. You can refer to the discussion thread [1]. Basically what I can recall is because of Hadoop3's shell script rewrite, the packaging is really a challenge. Some of our folks have done POC at [2] but the problem is not fully solved. - Upgrade the OS to Debian 10 Buster > > All the BigTop packages seem to be enough for our use cases (we > already have our own puppet automation), the only thing left would be > Hue but it is easy to package it (or re-use the CDH version as interim > solution). I have a couple of questions for you: > Hue was included before 1.2.1 (inclusive), however dropped since 1.3 release. It was done in [3]. I can't recall the reason of dropping hue though... > 1) Has anybody attempted something similar in the past? If so, there > is some documentation and/or advice about how to do the migration? > From what I gathered CDH is based upon BigTop so the only difference > would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily > patched so not sure what version it could be compared to). Hive also > changes between the distro (1.1 vs 2.x), but we are looking forward to > upgrade! > 5 years ago my company was on CDH4(pkgs w/ our own puppet, no CM) and we decided to move to Bigtop 0.8. You can refer to [4][5]. Basically the idea is: 1. do parallel migration. 2. categorize data into hot-cold data. 3. [cold data] establish kerberos federation and move the cold data via cross cluster distcp under the hood. 4. [hot data] distcp with user's cooperation. > > 2) Is there any documentation about how to move from Hadoop 2 to > Hadoop 3 using BigTop? As far as I know the procedure is very delicate > and needs to be done with precise steps (I am mostly concerned of HDFS > consistency). > No. I suggestion is to refer to Hadoop or Cloudera's upgrade guide. I've done upgrade previously from hadoop 1.X to 2.X. Basically you just follow the guide. If you have a staging cluster, try on that one first. > > 3) As far as I know Debian 10 (Buster) ships only with openjdk-11, but > we were planning to keep using openjdk-8 for the near/medium-term. > From > https://github.com/apache/bigtop/blob/master/bigtop_toolchain/manifests/jdk.pp#L25-L41 > it seems that BigTop is aligned with this goal, but better to double > check. > AFAIK this is aligned, however things are subject to change so no guarantee before the release has been made ;) > > Thanks in advance! > > Luca Let me know if you have more questions :) [1] https://lists.apache.org/thread.html/2f80388a1f87bed20de2bb61882e734d76623896812cd0ae168b8ff5%40%3Cdev.bigtop.apache.org%3E [2] https://github.com/apache/bigtop/tree/bigtop-alpha [3] https://issues.apache.org/jira/browse/BIGTOP-3021 [4] https://www.slideshare.net/takeshi_miao/zerodowntime-hadoophbase-crossdatacenter-migration?qid=be058c6e-a799-4a8c-bfa4-35f599074482&v=&b=&from_search=1 [5] https://www.slideshare.net/YafangChang/hadoopcon2015-multicluster-live-synchronization-with-kerberos-federated-hadoop?qid=e5c77fd9-5038-4fe1-b233-5b025767c763&v=&b=&from_search=1