Re: HBase 2.4.x + Spark 3.3

2022-10-24 Thread Wei-Chiu Chuang
This looks similar to what we are seeing with Ozone running w/ Spark 3.3
HDDS-6926 

My guess is Spark 3.3 switched to the shaded Hadoop client jar, where
protobuf message classes are shaded, and somehow that breaks applications
using old Hadoop classes.

If you can share some stack traces that would be great.


On Mon, Oct 24, 2022 at 11:26 AM Lars Francke 
wrote:

> Hi Andrew,
>
> okay, we'll try that and will report back when/if we get it working.
>
> Cheers,
> Lars
>
> On Thu, Oct 20, 2022 at 2:29 AM Andrew Purtell 
> wrote:
>
> > No, that is insufficient. HBase must be recompiled against Hadoop 3 first
> >
> > cd /path/to/hbase
> > mvn clean install assembly:single -DskipTests -Dhadoop.profile=3.0
> > -Dhadoop-three.version=XXX
> >
> > Then once the results are in your local maven cache or nexus instance,
> you
> > can compile Spark as indicated.
> >
> >
> > On Tue, Oct 18, 2022 at 11:39 PM Lars Francke 
> > wrote:
> >
> > > Hi Andrew,
> > >
> > > thanks for the reply.
> > > I should have been more specific: We only tried to compile the "client"
> > > part that's used in Spark itself and we used the proper versions
> > >
> > > mvn -Dspark.version=XXX -Dscala.version=XXX -Dhadoop-three.version=XXX
> > > -Dscala.binary.version=XXX -Dhbase.version=XXX clean package
> > >
> > > I assume that should pull in the correct dependencies but I have to
> admit
> > > that I didn't check, took it straight from the readme.
> > > We wanted to try the server bit for the RegionServers afterwards but
> > didn't
> > > even get to it yet.
> > >
> > > We have this on our radar though and might try to work through those
> > issues
> > > at some point.
> > > If we get started on that I'll ping the list.
> > >
> > > Cheers,
> > > Lars
> > >
> > > On Wed, Oct 19, 2022 at 1:41 AM Andrew Purtell 
> > > wrote:
> > >
> > > > Out of the box use is going to be problematic without recompiling
> HBase
> > > for
> > > > Hadoop 3. Spark 3.3 ships with Hadoop 3.3.2. Apache HBase 2.4.x (and
> > all
> > > > 2.x) releases are compiled against Hadoop 2. Link errors
> > (ClassNotFound,
> > > > NoClassDef, etc) I think are to be expected because the class
> > hierarchies
> > > > of various Hadoop things have been incompatibly changed in 3.x
> releases
> > > > relative to 2.x. This is not unreasonable. Semantic versioning
> suggests
> > > > breaking changes can be expected in a major version increment.
> > > >
> > > > Users probably need to do a holistic (or hermetic, if you prefer)
> build
> > > of
> > > > their bill of materials before testing it or certainly before
> shipping
> > > it.
> > > > Build your HBase for the version of Hadoop you are actually shipping
> it
> > > > with, as opposed to whatever the upstream project picks as a default
> > > build
> > > > target. They are called "convenience binaries" by the project and the
> > > > Foundation for a reason. Convenience may vary according to your
> > > > circumstances. When HBase finally ships builds compiled against
> Hadoop
> > 3
> > > by
> > > > default, anyone still using 2.x in production will face the same
> > problem
> > > > (in reverse). The Phoenix project also faces this issue for what it's
> > > > worth. Their readme and build instructions walk users through
> > rebuilding
> > > > HBase using -Dhadoop.profile=3.0 as a first step as well.
> > > >
> > > >
> > > > On Mon, Oct 17, 2022 at 1:52 PM Lars Francke  >
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > we've just recently tried getting the HBase Spark connector running
> > > > against
> > > > > Spark 3.3 and HBase 2.4.x and failed miserably. It was a mess of
> > Scala
> > > > and
> > > > > Java issues, classpath, NoClassDef etc.
> > > > >
> > > > > The trauma is too recent for me to dig up the details but if
> needed I
> > > can
> > > > > ;-)
> > > > >
> > > > > For now I'm just wondering if anyone has succeeded using this
> > > > combination?
> > > > >
> > > > > Cheers,
> > > > > Lars
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Unrest, ignorance distilled, nihilistic imbeciles -
> > > > It's what we’ve earned
> > > > Welcome, apocalypse, what’s taken you so long?
> > > > Bring us the fitting end that we’ve been counting on
> > > >- A23, Welcome, Apocalypse
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Unrest, ignorance distilled, nihilistic imbeciles -
> > It's what we’ve earned
> > Welcome, apocalypse, what’s taken you so long?
> > Bring us the fitting end that we’ve been counting on
> >- A23, Welcome, Apocalypse
> >
>


Re: HBase 2.4.x + Spark 3.3

2022-10-24 Thread Lars Francke
Hi Andrew,

okay, we'll try that and will report back when/if we get it working.

Cheers,
Lars

On Thu, Oct 20, 2022 at 2:29 AM Andrew Purtell  wrote:

> No, that is insufficient. HBase must be recompiled against Hadoop 3 first
>
> cd /path/to/hbase
> mvn clean install assembly:single -DskipTests -Dhadoop.profile=3.0
> -Dhadoop-three.version=XXX
>
> Then once the results are in your local maven cache or nexus instance, you
> can compile Spark as indicated.
>
>
> On Tue, Oct 18, 2022 at 11:39 PM Lars Francke 
> wrote:
>
> > Hi Andrew,
> >
> > thanks for the reply.
> > I should have been more specific: We only tried to compile the "client"
> > part that's used in Spark itself and we used the proper versions
> >
> > mvn -Dspark.version=XXX -Dscala.version=XXX -Dhadoop-three.version=XXX
> > -Dscala.binary.version=XXX -Dhbase.version=XXX clean package
> >
> > I assume that should pull in the correct dependencies but I have to admit
> > that I didn't check, took it straight from the readme.
> > We wanted to try the server bit for the RegionServers afterwards but
> didn't
> > even get to it yet.
> >
> > We have this on our radar though and might try to work through those
> issues
> > at some point.
> > If we get started on that I'll ping the list.
> >
> > Cheers,
> > Lars
> >
> > On Wed, Oct 19, 2022 at 1:41 AM Andrew Purtell 
> > wrote:
> >
> > > Out of the box use is going to be problematic without recompiling HBase
> > for
> > > Hadoop 3. Spark 3.3 ships with Hadoop 3.3.2. Apache HBase 2.4.x (and
> all
> > > 2.x) releases are compiled against Hadoop 2. Link errors
> (ClassNotFound,
> > > NoClassDef, etc) I think are to be expected because the class
> hierarchies
> > > of various Hadoop things have been incompatibly changed in 3.x releases
> > > relative to 2.x. This is not unreasonable. Semantic versioning suggests
> > > breaking changes can be expected in a major version increment.
> > >
> > > Users probably need to do a holistic (or hermetic, if you prefer) build
> > of
> > > their bill of materials before testing it or certainly before shipping
> > it.
> > > Build your HBase for the version of Hadoop you are actually shipping it
> > > with, as opposed to whatever the upstream project picks as a default
> > build
> > > target. They are called "convenience binaries" by the project and the
> > > Foundation for a reason. Convenience may vary according to your
> > > circumstances. When HBase finally ships builds compiled against Hadoop
> 3
> > by
> > > default, anyone still using 2.x in production will face the same
> problem
> > > (in reverse). The Phoenix project also faces this issue for what it's
> > > worth. Their readme and build instructions walk users through
> rebuilding
> > > HBase using -Dhadoop.profile=3.0 as a first step as well.
> > >
> > >
> > > On Mon, Oct 17, 2022 at 1:52 PM Lars Francke 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > we've just recently tried getting the HBase Spark connector running
> > > against
> > > > Spark 3.3 and HBase 2.4.x and failed miserably. It was a mess of
> Scala
> > > and
> > > > Java issues, classpath, NoClassDef etc.
> > > >
> > > > The trauma is too recent for me to dig up the details but if needed I
> > can
> > > > ;-)
> > > >
> > > > For now I'm just wondering if anyone has succeeded using this
> > > combination?
> > > >
> > > > Cheers,
> > > > Lars
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Unrest, ignorance distilled, nihilistic imbeciles -
> > > It's what we’ve earned
> > > Welcome, apocalypse, what’s taken you so long?
> > > Bring us the fitting end that we’ve been counting on
> > >- A23, Welcome, Apocalypse
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Unrest, ignorance distilled, nihilistic imbeciles -
> It's what we’ve earned
> Welcome, apocalypse, what’s taken you so long?
> Bring us the fitting end that we’ve been counting on
>- A23, Welcome, Apocalypse
>


Re: HBase 2.4.x + Spark 3.3

2022-10-19 Thread Andrew Purtell
No, that is insufficient. HBase must be recompiled against Hadoop 3 first

cd /path/to/hbase
mvn clean install assembly:single -DskipTests -Dhadoop.profile=3.0
-Dhadoop-three.version=XXX

Then once the results are in your local maven cache or nexus instance, you
can compile Spark as indicated.


On Tue, Oct 18, 2022 at 11:39 PM Lars Francke 
wrote:

> Hi Andrew,
>
> thanks for the reply.
> I should have been more specific: We only tried to compile the "client"
> part that's used in Spark itself and we used the proper versions
>
> mvn -Dspark.version=XXX -Dscala.version=XXX -Dhadoop-three.version=XXX
> -Dscala.binary.version=XXX -Dhbase.version=XXX clean package
>
> I assume that should pull in the correct dependencies but I have to admit
> that I didn't check, took it straight from the readme.
> We wanted to try the server bit for the RegionServers afterwards but didn't
> even get to it yet.
>
> We have this on our radar though and might try to work through those issues
> at some point.
> If we get started on that I'll ping the list.
>
> Cheers,
> Lars
>
> On Wed, Oct 19, 2022 at 1:41 AM Andrew Purtell 
> wrote:
>
> > Out of the box use is going to be problematic without recompiling HBase
> for
> > Hadoop 3. Spark 3.3 ships with Hadoop 3.3.2. Apache HBase 2.4.x (and all
> > 2.x) releases are compiled against Hadoop 2. Link errors (ClassNotFound,
> > NoClassDef, etc) I think are to be expected because the class hierarchies
> > of various Hadoop things have been incompatibly changed in 3.x releases
> > relative to 2.x. This is not unreasonable. Semantic versioning suggests
> > breaking changes can be expected in a major version increment.
> >
> > Users probably need to do a holistic (or hermetic, if you prefer) build
> of
> > their bill of materials before testing it or certainly before shipping
> it.
> > Build your HBase for the version of Hadoop you are actually shipping it
> > with, as opposed to whatever the upstream project picks as a default
> build
> > target. They are called "convenience binaries" by the project and the
> > Foundation for a reason. Convenience may vary according to your
> > circumstances. When HBase finally ships builds compiled against Hadoop 3
> by
> > default, anyone still using 2.x in production will face the same problem
> > (in reverse). The Phoenix project also faces this issue for what it's
> > worth. Their readme and build instructions walk users through rebuilding
> > HBase using -Dhadoop.profile=3.0 as a first step as well.
> >
> >
> > On Mon, Oct 17, 2022 at 1:52 PM Lars Francke 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > we've just recently tried getting the HBase Spark connector running
> > against
> > > Spark 3.3 and HBase 2.4.x and failed miserably. It was a mess of Scala
> > and
> > > Java issues, classpath, NoClassDef etc.
> > >
> > > The trauma is too recent for me to dig up the details but if needed I
> can
> > > ;-)
> > >
> > > For now I'm just wondering if anyone has succeeded using this
> > combination?
> > >
> > > Cheers,
> > > Lars
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Unrest, ignorance distilled, nihilistic imbeciles -
> > It's what we’ve earned
> > Welcome, apocalypse, what’s taken you so long?
> > Bring us the fitting end that we’ve been counting on
> >- A23, Welcome, Apocalypse
> >
>


-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apocalypse


Re: HBase 2.4.x + Spark 3.3

2022-10-19 Thread Lars Francke
Hi Andrew,

thanks for the reply.
I should have been more specific: We only tried to compile the "client"
part that's used in Spark itself and we used the proper versions

mvn -Dspark.version=XXX -Dscala.version=XXX -Dhadoop-three.version=XXX
-Dscala.binary.version=XXX -Dhbase.version=XXX clean package

I assume that should pull in the correct dependencies but I have to admit
that I didn't check, took it straight from the readme.
We wanted to try the server bit for the RegionServers afterwards but didn't
even get to it yet.

We have this on our radar though and might try to work through those issues
at some point.
If we get started on that I'll ping the list.

Cheers,
Lars

On Wed, Oct 19, 2022 at 1:41 AM Andrew Purtell  wrote:

> Out of the box use is going to be problematic without recompiling HBase for
> Hadoop 3. Spark 3.3 ships with Hadoop 3.3.2. Apache HBase 2.4.x (and all
> 2.x) releases are compiled against Hadoop 2. Link errors (ClassNotFound,
> NoClassDef, etc) I think are to be expected because the class hierarchies
> of various Hadoop things have been incompatibly changed in 3.x releases
> relative to 2.x. This is not unreasonable. Semantic versioning suggests
> breaking changes can be expected in a major version increment.
>
> Users probably need to do a holistic (or hermetic, if you prefer) build of
> their bill of materials before testing it or certainly before shipping it.
> Build your HBase for the version of Hadoop you are actually shipping it
> with, as opposed to whatever the upstream project picks as a default build
> target. They are called "convenience binaries" by the project and the
> Foundation for a reason. Convenience may vary according to your
> circumstances. When HBase finally ships builds compiled against Hadoop 3 by
> default, anyone still using 2.x in production will face the same problem
> (in reverse). The Phoenix project also faces this issue for what it's
> worth. Their readme and build instructions walk users through rebuilding
> HBase using -Dhadoop.profile=3.0 as a first step as well.
>
>
> On Mon, Oct 17, 2022 at 1:52 PM Lars Francke 
> wrote:
>
> > Hi everyone,
> >
> > we've just recently tried getting the HBase Spark connector running
> against
> > Spark 3.3 and HBase 2.4.x and failed miserably. It was a mess of Scala
> and
> > Java issues, classpath, NoClassDef etc.
> >
> > The trauma is too recent for me to dig up the details but if needed I can
> > ;-)
> >
> > For now I'm just wondering if anyone has succeeded using this
> combination?
> >
> > Cheers,
> > Lars
> >
>
>
> --
> Best regards,
> Andrew
>
> Unrest, ignorance distilled, nihilistic imbeciles -
> It's what we’ve earned
> Welcome, apocalypse, what’s taken you so long?
> Bring us the fitting end that we’ve been counting on
>- A23, Welcome, Apocalypse
>


Re: HBase 2.4.x + Spark 3.3

2022-10-18 Thread Andrew Purtell
Out of the box use is going to be problematic without recompiling HBase for
Hadoop 3. Spark 3.3 ships with Hadoop 3.3.2. Apache HBase 2.4.x (and all
2.x) releases are compiled against Hadoop 2. Link errors (ClassNotFound,
NoClassDef, etc) I think are to be expected because the class hierarchies
of various Hadoop things have been incompatibly changed in 3.x releases
relative to 2.x. This is not unreasonable. Semantic versioning suggests
breaking changes can be expected in a major version increment.

Users probably need to do a holistic (or hermetic, if you prefer) build of
their bill of materials before testing it or certainly before shipping it.
Build your HBase for the version of Hadoop you are actually shipping it
with, as opposed to whatever the upstream project picks as a default build
target. They are called "convenience binaries" by the project and the
Foundation for a reason. Convenience may vary according to your
circumstances. When HBase finally ships builds compiled against Hadoop 3 by
default, anyone still using 2.x in production will face the same problem
(in reverse). The Phoenix project also faces this issue for what it's
worth. Their readme and build instructions walk users through rebuilding
HBase using -Dhadoop.profile=3.0 as a first step as well.


On Mon, Oct 17, 2022 at 1:52 PM Lars Francke  wrote:

> Hi everyone,
>
> we've just recently tried getting the HBase Spark connector running against
> Spark 3.3 and HBase 2.4.x and failed miserably. It was a mess of Scala and
> Java issues, classpath, NoClassDef etc.
>
> The trauma is too recent for me to dig up the details but if needed I can
> ;-)
>
> For now I'm just wondering if anyone has succeeded using this combination?
>
> Cheers,
> Lars
>


-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apocalypse