Let me answer some parts w/ my best effort and let the others to add the
things I'm missing.

Luca Toscano <toscano.l...@gmail.com> 於 2020年1月15日 週三 下午9:25寫道:

> Hi everybody,
>
> I am part of the Analytics team of the Wikimedia Foundation. We are
> currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch
> hosts), and for various reasons we'd love to explore the possibility
> of moving to BigTop :)
>
> The long term plan that we have in mind is something like the following:
>
> - Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x
> release)
> - Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to
> 3.x)


1.5 is BOM is not finalized yet.  It can be Hadoop 2.X or Hadoop 3,
depending on whether Hadoop 3 packaging is solved. You can refer to the
discussion thread [1].
Basically what I can recall is because of Hadoop3's shell script rewrite,
the packaging is really a challenge. Some of our folks have done POC at [2]
but the problem is not fully solved.

- Upgrade the OS to Debian 10 Buster
>
> All the BigTop packages seem to be enough for our use cases (we
> already have our own puppet automation), the only thing left would be
> Hue but it is easy to package it (or re-use the CDH version as interim
> solution). I have a couple of questions for you:
>

Hue was included before 1.2.1 (inclusive), however dropped since 1.3
release. It was done in [3].
I can't recall the reason of dropping hue though...


> 1) Has anybody attempted something similar in the past? If so, there
> is some documentation and/or advice about how to do the migration?
> From what I gathered CDH is based upon BigTop so the only difference
> would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily
> patched so not sure what version it could be compared to). Hive also
> changes between the distro (1.1 vs 2.x), but we are looking forward to
> upgrade!
>

5 years ago my company was on CDH4(pkgs w/ our own puppet, no CM) and we
decided to move to Bigtop 0.8.
You can refer to [4][5]. Basically the idea is:
1. do parallel migration.
2. categorize data into hot-cold data.
3. [cold data] establish kerberos federation and move the cold data via
cross cluster distcp under the hood.
4. [hot data] distcp with user's cooperation.


>
> 2) Is there any documentation about how to move from Hadoop 2 to
> Hadoop 3 using BigTop? As far as I know the procedure is very delicate
> and needs to be done with precise steps (I am mostly concerned of HDFS
> consistency).
>

No. I suggestion is to refer to Hadoop or Cloudera's upgrade guide.
I've done upgrade previously from hadoop 1.X to 2.X. Basically you just
follow the guide. If you have a staging cluster, try on that one first.


>
> 3) As far as I know Debian 10 (Buster) ships only with openjdk-11, but
> we were planning to keep using openjdk-8 for the near/medium-term.
> From
> https://github.com/apache/bigtop/blob/master/bigtop_toolchain/manifests/jdk.pp#L25-L41
> it seems that BigTop is aligned with this goal, but better to double
> check.
>

AFAIK this is aligned, however things are subject to change so no guarantee
before the release has been made ;)


>
> Thanks in advance!
>
> Luca


Let me know if you have more questions :)

[1]
https://lists.apache.org/thread.html/2f80388a1f87bed20de2bb61882e734d76623896812cd0ae168b8ff5%40%3Cdev.bigtop.apache.org%3E

[2] https://github.com/apache/bigtop/tree/bigtop-alpha
[3] https://issues.apache.org/jira/browse/BIGTOP-3021
[4]
https://www.slideshare.net/takeshi_miao/zerodowntime-hadoophbase-crossdatacenter-migration?qid=be058c6e-a799-4a8c-bfa4-35f599074482&v=&b=&from_search=1
[5]
https://www.slideshare.net/YafangChang/hadoopcon2015-multicluster-live-synchronization-with-kerberos-federated-hadoop?qid=e5c77fd9-5038-4fe1-b233-5b025767c763&v=&b=&from_search=1

Reply via email to