Hi Lucas, Might be nice if you document your steps and make them available to the community. I think it might interest many other users.
JMS Le mer. 15 janv. 2020 à 11:55, Evans Ye <evan...@apache.org> a écrit : > Let me answer some parts w/ my best effort and let the others to add the > things I'm missing. > > Luca Toscano <toscano.l...@gmail.com> 於 2020年1月15日 週三 下午9:25寫道: > >> Hi everybody, >> >> I am part of the Analytics team of the Wikimedia Foundation. We are >> currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch >> hosts), and for various reasons we'd love to explore the possibility >> of moving to BigTop :) >> >> The long term plan that we have in mind is something like the following: >> >> - Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x >> release) >> - Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to >> 3.x) > > > 1.5 is BOM is not finalized yet. It can be Hadoop 2.X or Hadoop 3, > depending on whether Hadoop 3 packaging is solved. You can refer to the > discussion thread [1]. > Basically what I can recall is because of Hadoop3's shell script rewrite, > the packaging is really a challenge. Some of our folks have done POC at [2] > but the problem is not fully solved. > > - Upgrade the OS to Debian 10 Buster >> >> All the BigTop packages seem to be enough for our use cases (we >> already have our own puppet automation), the only thing left would be >> Hue but it is easy to package it (or re-use the CDH version as interim >> solution). I have a couple of questions for you: >> > > Hue was included before 1.2.1 (inclusive), however dropped since 1.3 > release. It was done in [3]. > I can't recall the reason of dropping hue though... > > >> 1) Has anybody attempted something similar in the past? If so, there >> is some documentation and/or advice about how to do the migration? >> From what I gathered CDH is based upon BigTop so the only difference >> would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily >> patched so not sure what version it could be compared to). Hive also >> changes between the distro (1.1 vs 2.x), but we are looking forward to >> upgrade! >> > > 5 years ago my company was on CDH4(pkgs w/ our own puppet, no CM) and we > decided to move to Bigtop 0.8. > You can refer to [4][5]. Basically the idea is: > 1. do parallel migration. > 2. categorize data into hot-cold data. > 3. [cold data] establish kerberos federation and move the cold data via > cross cluster distcp under the hood. > 4. [hot data] distcp with user's cooperation. > > >> >> 2) Is there any documentation about how to move from Hadoop 2 to >> Hadoop 3 using BigTop? As far as I know the procedure is very delicate >> and needs to be done with precise steps (I am mostly concerned of HDFS >> consistency). >> > > No. I suggestion is to refer to Hadoop or Cloudera's upgrade guide. > I've done upgrade previously from hadoop 1.X to 2.X. Basically you just > follow the guide. If you have a staging cluster, try on that one first. > > >> >> 3) As far as I know Debian 10 (Buster) ships only with openjdk-11, but >> we were planning to keep using openjdk-8 for the near/medium-term. >> From >> https://github.com/apache/bigtop/blob/master/bigtop_toolchain/manifests/jdk.pp#L25-L41 >> it seems that BigTop is aligned with this goal, but better to double >> check. >> > > AFAIK this is aligned, however things are subject to change so no > guarantee before the release has been made ;) > > >> >> Thanks in advance! >> >> Luca > > > Let me know if you have more questions :) > > [1] > https://lists.apache.org/thread.html/2f80388a1f87bed20de2bb61882e734d76623896812cd0ae168b8ff5%40%3Cdev.bigtop.apache.org%3E > > [2] https://github.com/apache/bigtop/tree/bigtop-alpha > [3] https://issues.apache.org/jira/browse/BIGTOP-3021 > [4] > https://www.slideshare.net/takeshi_miao/zerodowntime-hadoophbase-crossdatacenter-migration?qid=be058c6e-a799-4a8c-bfa4-35f599074482&v=&b=&from_search=1 > [5] > https://www.slideshare.net/YafangChang/hadoopcon2015-multicluster-live-synchronization-with-kerberos-federated-hadoop?qid=e5c77fd9-5038-4fe1-b233-5b025767c763&v=&b=&from_search=1 >