We are not paying for CDH -- our older version of CDH (5.16.2) was pre-licensing. We've never used CM. We are planning to migrate off of CDH onto apache, and have 10+ years of experience working with HBase internals and operating HBase at scale. I'm curious if anyone has knowledge of any incompatibilities in the replication layer between these 2 versions, as that is not very well covered in the public docs afaict. I'm aware this will likely be a multi-month or year+ long project for us, and am just starting the investigation phase :) It honestly looks like it might be an easier project than the pre-0.96 to 1.x upgrade we undertook years ago, though we're at a different scale today.
On Wed, May 19, 2021 at 9:17 AM Marc Hoppins <marc.hopp...@eset.sk> wrote: > If you are paying for CDH then just upgrade via cloudera manager. If you > are not paying for it then I think you will find it a huge problem. > > Upgade may have to be done using a version 6 then a newer version to get > to a suitable Hbase/Hadoop version. > > We are currently on CDH6.3.2 but the Hbase is an extremely useless version > (2.1.0) and we are not in the business of generating income from the data > so cannot justify the exorbitant cost per node that cloudera are asking for > later versions. > > -----Original Message----- > From: Bryan Beaudreault <bbeaudrea...@hubspot.com.INVALID> > Sent: Wednesday, May 19, 2021 2:49 PM > To: user@hbase.apache.org > Subject: Upgrading cdh5.16.2 to apache hbase 2.4 using replication > > EXTERNAL > > We are running about 40 HBase clusters, with over 5000 regionservers total. > These are all running cdh5.16.2. We also have thousands of clients (from > APIs to kafka workers to hadoop jobs, etc) hitting these various clusters, > also running cdh5.16.2. > > We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've read > through the docs on https://hbase.apache.org/book.html#_upgrade_paths > <https://hbase.apache.org/book.html#_upgrade_paths>, > and am starting to plan our approach. More than a few seconds of downtime > is not an option, but rolling upgrade also seems risky (if not impossible > for our version). > > One thought I had is whether replication is compatible between these two > versions. If so, we probably would consider swapping onto upgraded clusters > using backup/restore + replication. If we were to go this route we'd > probably want to consider bi-directional replication so that we can roll > back to the old cluster if there's a regression. > > Does anyone have any experience with this approach? Is replication > protocol compatible across the seversions? Any concerns, tips or other > considerations to keep in mind? We do the backup/restore + replication > approach pretty regularly to move tables between clusters. > > Thanks! >