Hi Lucas,

Might be nice if you document your steps and make them available to the
community. I think it might interest many other users.

JMS

Le mer. 15 janv. 2020 à 11:55, Evans Ye <evan...@apache.org> a écrit :

> Let me answer some parts w/ my best effort and let the others to add the
> things I'm missing.
>
> Luca Toscano <toscano.l...@gmail.com> 於 2020年1月15日 週三 下午9:25寫道:
>
>> Hi everybody,
>>
>> I am part of the Analytics team of the Wikimedia Foundation. We are
>> currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch
>> hosts), and for various reasons we'd love to explore the possibility
>> of moving to BigTop :)
>>
>> The long term plan that we have in mind is something like the following:
>>
>> - Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x
>> release)
>> - Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to
>> 3.x)
>
>
> 1.5 is BOM is not finalized yet.  It can be Hadoop 2.X or Hadoop 3,
> depending on whether Hadoop 3 packaging is solved. You can refer to the
> discussion thread [1].
> Basically what I can recall is because of Hadoop3's shell script rewrite,
> the packaging is really a challenge. Some of our folks have done POC at [2]
> but the problem is not fully solved.
>
> - Upgrade the OS to Debian 10 Buster
>>
>> All the BigTop packages seem to be enough for our use cases (we
>> already have our own puppet automation), the only thing left would be
>> Hue but it is easy to package it (or re-use the CDH version as interim
>> solution). I have a couple of questions for you:
>>
>
> Hue was included before 1.2.1 (inclusive), however dropped since 1.3
> release. It was done in [3].
> I can't recall the reason of dropping hue though...
>
>
>> 1) Has anybody attempted something similar in the past? If so, there
>> is some documentation and/or advice about how to do the migration?
>> From what I gathered CDH is based upon BigTop so the only difference
>> would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily
>> patched so not sure what version it could be compared to). Hive also
>> changes between the distro (1.1 vs 2.x), but we are looking forward to
>> upgrade!
>>
>
> 5 years ago my company was on CDH4(pkgs w/ our own puppet, no CM) and we
> decided to move to Bigtop 0.8.
> You can refer to [4][5]. Basically the idea is:
> 1. do parallel migration.
> 2. categorize data into hot-cold data.
> 3. [cold data] establish kerberos federation and move the cold data via
> cross cluster distcp under the hood.
> 4. [hot data] distcp with user's cooperation.
>
>
>>
>> 2) Is there any documentation about how to move from Hadoop 2 to
>> Hadoop 3 using BigTop? As far as I know the procedure is very delicate
>> and needs to be done with precise steps (I am mostly concerned of HDFS
>> consistency).
>>
>
> No. I suggestion is to refer to Hadoop or Cloudera's upgrade guide.
> I've done upgrade previously from hadoop 1.X to 2.X. Basically you just
> follow the guide. If you have a staging cluster, try on that one first.
>
>
>>
>> 3) As far as I know Debian 10 (Buster) ships only with openjdk-11, but
>> we were planning to keep using openjdk-8 for the near/medium-term.
>> From
>> https://github.com/apache/bigtop/blob/master/bigtop_toolchain/manifests/jdk.pp#L25-L41
>> it seems that BigTop is aligned with this goal, but better to double
>> check.
>>
>
> AFAIK this is aligned, however things are subject to change so no
> guarantee before the release has been made ;)
>
>
>>
>> Thanks in advance!
>>
>> Luca
>
>
> Let me know if you have more questions :)
>
> [1]
> https://lists.apache.org/thread.html/2f80388a1f87bed20de2bb61882e734d76623896812cd0ae168b8ff5%40%3Cdev.bigtop.apache.org%3E
>
> [2] https://github.com/apache/bigtop/tree/bigtop-alpha
> [3] https://issues.apache.org/jira/browse/BIGTOP-3021
> [4]
> https://www.slideshare.net/takeshi_miao/zerodowntime-hadoophbase-crossdatacenter-migration?qid=be058c6e-a799-4a8c-bfa4-35f599074482&v=&b=&from_search=1
> [5]
> https://www.slideshare.net/YafangChang/hadoopcon2015-multicluster-live-synchronization-with-kerberos-federated-hadoop?qid=e5c77fd9-5038-4fe1-b233-5b025767c763&v=&b=&from_search=1
>

Reply via email to