Luca Toscano <toscano.l...@gmail.com> 於 2020年1月17日 週五 上午12:34寫道:

> Il giorno mer 15 gen 2020 alle ore 17:55 Evans Ye <evan...@apache.org>
> ha scritto:
> >
> > Let me answer some parts w/ my best effort and let the others to add the
> things I'm missing.
>
> Thanks a lot :)
>
> > Luca Toscano <toscano.l...@gmail.com> 於 2020年1月15日 週三 下午9:25寫道:
> >>
> >> Hi everybody,
> >>
> >> I am part of the Analytics team of the Wikimedia Foundation. We are
> >> currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch
> >> hosts), and for various reasons we'd love to explore the possibility
> >> of moving to BigTop :)
> >>
> >> The long term plan that we have in mind is something like the following:
> >>
> >> - Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x
> release)
> >> - Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to
> 3.x)
> >
> >
> > 1.5 is BOM is not finalized yet.  It can be Hadoop 2.X or Hadoop 3,
> depending on whether Hadoop 3 packaging is solved. You can refer to the
> discussion thread [1].
> > Basically what I can recall is because of Hadoop3's shell script
> rewrite, the packaging is really a challenge. Some of our folks have done
> POC at [2] but the problem is not fully solved.
>
> Makes sense yes, but the long term plan is to eventually ship Hadoop 3
> right? Seems an obvious question I know but better double checking.
>

Though everything should be decided by community, I'm pretty sure that
we'll be on Hadoop 3 eventually, unless we can directly jump over to Hadoop
4 ;)


>
> >> 1) Has anybody attempted something similar in the past? If so, there
> >> is some documentation and/or advice about how to do the migration?
> >> From what I gathered CDH is based upon BigTop so the only difference
> >> would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily
> >> patched so not sure what version it could be compared to). Hive also
> >> changes between the distro (1.1 vs 2.x), but we are looking forward to
> >> upgrade!
> >
> > 5 years ago my company was on CDH4(pkgs w/ our own puppet, no CM) and we
> decided to move to Bigtop 0.8.
>

A correction, not my company, the company I worked for. Sorry for my poor
English...


> > You can refer to [4][5]. Basically the idea is:
> > 1. do parallel migration.
> > 2. categorize data into hot-cold data.
> > 3. [cold data] establish kerberos federation and move the cold data via
> cross cluster distcp under the hood.
> > 4. [hot data] distcp with user's cooperation.
>
> Really nice links. We do use Kerberos (single realm) as well so it was
> an interesting reading. We don't have super strict availability
> concerns for Hadoop, usually we take our cluster offline for one/two
> hours if needed for important migrations (CDH upgrades, Java upgrades,
> Kerberos etc.. just to name a few). Would it be possible to upgrade in
> place in your opinion, swapping CDH packages with BigTop ones?


We had experience for an in-place upgrade as well. The whole story is we
were at Hadoop 1.X(CDH3) and upgraded in-place to Hadoop 2.X(CDH4), then we
migrated to Bigtop 0.8 w/ parallel migration as you saw in the slides.

For Hadoop 1.X to Hadoop 2.X the binary was different hence we need to go
through an upgrade w/ downtime to covert HDFS blocks to newer format. That
would somehow took 2X storage during migration before you finalize the
upgrade. Anyhow the upgrade went well.

Though the path is different, I think it's possible to do the upgrade from
CDH's 2.6 to Bigtop's 2.8. What you need to do is just follow Hadoop
official guide to upgrade from 2.6 to 2.8. The only think that I can
imaging to fail the migrate is Cloudera did something in CDH that makes it
not compatible to open source Hadoop... which is pretty much impossible...

My suggestion is to seek for advice in Hadoop community for 2.6 to 2.8
upgrade. They should give you the most knowledgeable answer.

To share our experience for the in-place upgrade, we first write down the
runbook and then did as many rehearsal as we can on the staging cluster.
You can do upgrade-rollback-upgrade-rollback many times before you finalize
the upgrade. Another tip is to make sure your production cluster is in a
healthy state before the upgrade so that you have less risk of getting
missing blocks. Make sure there's no large amount of replication going on
caused by disk/datanode down.


> We have
> a testing/staging cluster to use as playground, and today I tried to
> replace CDH 5.16.2 packages with BigTop's 1.4 ones on one Hadoop test
> worker (I know very brutal and not elegant, but it was a test :). The
> HDFS datanode and journalnode daemons came up fine, but the Yarn Node
> Manager did not due to protocol buffer mismatch issues (I think due to
> https://issues.apache.org/jira/browse/YARN-8310). It was a good result
> in my opinion, my next step would be to stop the whole (test) cluster
> and also upgrade the other nodes, to see what works and what not. The
> HDFS Namenode's consistency is my first thought of course, but it
> should be like a 2.6 -> 2.8 upgrade in theory. What do you think?
>

Yes. in theory. Again I suggest to seek for advice in Hadoop community for
2.6 to 2.8 upgrade. Then rehearsals to minimize the risks.


>
> Il giorno mer 15 gen 2020 alle ore 18:10 Jean-Marc Spaggiari
> <jean-m...@spaggiari.org> ha scritto:
> >
> > Hi Lucas,
> >
> > Might be nice if you document your steps and make them available to the
> community. I think it might interest many other users.
>
> I will take care of it yes! One think that I am looking forward is to
> be part of this community and contribute back :)
>

We'd be happy to share your successful story of moving to Bigtop. If you
found anything like a bug report or feature improvement, don't hesitate to
report back to the community. We'd love to see your contribution and become
the part of the community, or even further paving the future of Bigtop with
us :)


>
> Luca
>

Reply via email to