Re: [DISCUSSION] Upgrading core dependencies

2017-02-11 Thread Andrew Purtell
Minor point, but I maintain we don't want to make coprocessors like osgi or 
built on osgi. I think we still want to scope them as extension mixins, not an 
inner platform. We see the limitations (limited API compatibility guarantees 
for internals by definition) over on Phoenix but it's the right trade off for 
HBase in my opinion. We can still help implementors by refactoring to stable 
supported interfaces as motivated on a case by case basis, like what we did 
with HRegion -> Region. 

Let's get rid of all Guava types in any public or LP API. 


> On Feb 7, 2017, at 9:31 PM, Stack  wrote:
> 
> Thanks Nick and Duo.
> 
> See below.
> 
>> On Tue, Feb 7, 2017 at 6:50 PM, Nick Dimiduk  wrote:
>> 
>> For the client: I'm a fan of shaded client modules by default and
>> minimizing the exposure of that surface area of 3rd party libs (none, if
>> possible). For example, Elastic Search has a similar set of challenges, the
>> solve it by advocating users shade from step 1. It's addressed first thing
>> in the docs for their client libs. We could take it a step further by
>> making the shaded client the default client (o.a.hbase:hbase-client)
>> artifact and internally consume an hbase-client-unshaded. Turns the whole
>> thing on it's head in a way that's better for the naive user.
>> 
>> 
> I like this idea. Let me try it out. Our shaded thingies are not 'air
> tight' enough yet I suspect but maybe we can fix this. Making it so clients
> don't have to include hbase-server too will be a little harder (will try
> flipping this too so it always shaded by default).
> 
> 
>> For MR/Spark/etc connectors: We're probably stuck as it is until necessary
>> classes can be extracted from hbase-server. I haven't looked into this
>> lately, so I hesitate to give a prescription.
>> 
>> 
> This was last attempt and the contributor did a good job at sizing the
> effort: HBASE-11843.
> 
> 
>> For coprocessors: They forfeit their right to 3rd party library dependency
>> stability by entering our process space. Maybe in 3.0 or 4.0 we can rebuild
>> on jigsaw or OSGi, but for today I think the best we should do is provide
>> relatively stable internal APIs. I also find it unlikely that we'd want to
>> spend loads of cycles optimizing for this usecase. There's other, bigger
>> fish, IMHO.
>> 
>> 
> Agree.
> 
> 
>> For size/compile time: I think these ultimately matter less than user
>> experience. Let's find a solution that sucks less for downstreamers and
>> work backward on reducing bloat.
>> 
>> I like how you put it.
> 
> 
>> On the point of leaning heavily on Guava: their pace is traditionally too
>> fast for us to expose in any public API. Maybe that's changing, in which
>> case we could reconsider for 3.0. Better to start using the new API's
>> available in Java 8...
>> 
>> 
> I like what Duo says here that we just not expose these libs in our API.
> 
> Yeah, we can do jdk8 new APIs but guava is something else (there is some
> small overlap in functional idioms -- we can favor jdk8 here -- but guava
> has a bunch more it'd be good to make use of).
> 
> Anyways, I was using Guava as illustration of a larger issue.
> 
> Thanks again for the input you two,
> S
> 
> 
> 
> 
> 
> 
>> Thanks for taking this up, Stack.
>> -n
>> 
>>> On Tue, Feb 7, 2017 at 12:22 PM Stack  wrote:
>>> 
>>> Here's an old thorny issue that won't go away. I'd like to hear what
>> folks
>>> are thinking these times.
>>> 
>>> My immediate need is that I want to upgrade Guava [1]. I want to move us
>> to
>>> guava 21.0, the latest release [2]. We currently depend on guava 12.0.
>>> Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We
>> could
>>> just do it in an hbase-2.0.0, a major version release, but then
>>> downstreamers and coprocessors that may have been a little lazy and that
>>> have transitively come to depend on our versions of libs will break [3].
>>> Then there is the murky area around the running of YARN/MR/Spark jobs
>> where
>>> the ordering of libs on the CLASSPATH gets interesting where fat-jaring
>> or
>>> command-line antics can get you over (most) problems if you persevere.
>>> 
>>> Multiply the above by netty, jackson, and a few other favorites.
>>> 
>>> Our proffered solution to the above is the shaded hbase artifact project;
>>> have applications and tasks refer to the shaded hbase client instead.
>>> Because we've not done the work to narrow the surface area we expose to
>>> downstreamers, most consumers of our API -- certainly in a spark/MR
>> context
>>> since our MR utility is buried in hbase-server module still -- need both
>>> the shaded hbase client and server on their CLASSPATH (i.e. near all of
>>> hbase).
>>> 
>>> Leaving aside for the moment that our shaded client and server need
>>> untangling, getting folks up on the shaded artifacts takes effort
>>> evangelizing. We also need to be doing work to make sure our shading
>>> doesn't leak dependencies, that it works for all deploy scenarios, and
>> that
>>> this 

Re: [DISCUSSION] Upgrading core dependencies

2017-02-08 Thread Jerry He
I was more on the thinking to avoid/reduce pulling in hbase dependencies
into hbase-spark, and that maybe even hbase-spark can depend on shaded
client and server -- it will be easier and more feasible if the shaded
client becomes the default as you mentioned.

Your idea that hbase-spark itself becomes a shaded artifact sounds better
if I understand you correctly. The spark/scala dependencies are 'provided'
already.

Jerry



On Wed, Feb 8, 2017 at 6:14 PM, Nick Dimiduk  wrote:

> On Wed, Feb 8, 2017 at 10:24 AM Jerry He  wrote:
>
> > Yeah.  Talking about the dependency, the hbase-spark module already has
> > dependency on hbase-server (coming from the spark bulk load producing
> > hfiles).
> > This is not very good. We have to be careful not entangling it more.
> > Also, there is already problem running the hbase-spark due to dependency
> > conflict, and one has to be careful about the order of the classpath to
> > make it work.
>
> We own the hbase-spark module, do we not? In that case, we control our own
> destiny. An explicit goal of this effort would be to make use of that
> module agnostic to classpath load order. As per my earlier reply,
> hbase-spark could itself be an artifact shaded over hbase-server and all of
> its dependencies. That way the user doesn't need to think about it at all.
>
> It further seems to me that the maven-shade-plugin could gain a new analyze
> goal, similar to the that of the dependency-plugin, which would audit the
> classes packaged in a jar vs the contract defined in configuration. This
> could further be used to fail the build of there's any warnings reported.
> I've found myself wanting this very functionality as I consume ES and
> Phoenix, shaded in downstream projects.
>


Re: [DISCUSSION] Upgrading core dependencies

2017-02-08 Thread Nick Dimiduk
On Wed, Feb 8, 2017 at 10:24 AM Jerry He  wrote:

> Yeah.  Talking about the dependency, the hbase-spark module already has
> dependency on hbase-server (coming from the spark bulk load producing
> hfiles).
> This is not very good. We have to be careful not entangling it more.
> Also, there is already problem running the hbase-spark due to dependency
> conflict, and one has to be careful about the order of the classpath to
> make it work.

We own the hbase-spark module, do we not? In that case, we control our own
destiny. An explicit goal of this effort would be to make use of that
module agnostic to classpath load order. As per my earlier reply,
hbase-spark could itself be an artifact shaded over hbase-server and all of
its dependencies. That way the user doesn't need to think about it at all.

It further seems to me that the maven-shade-plugin could gain a new analyze
goal, similar to the that of the dependency-plugin, which would audit the
classes packaged in a jar vs the contract defined in configuration. This
could further be used to fail the build of there's any warnings reported.
I've found myself wanting this very functionality as I consume ES and
Phoenix, shaded in downstream projects.


Re: [DISCUSSION] Upgrading core dependencies

2017-02-08 Thread Josh Elser

(late to the party, but..)

+1 Nick sums this up better than I could have.

Nick Dimiduk wrote:

For the client: I'm a fan of shaded client modules by default and
minimizing the exposure of that surface area of 3rd party libs (none, if
possible). For example, Elastic Search has a similar set of challenges, the
solve it by advocating users shade from step 1. It's addressed first thing
in the docs for their client libs. We could take it a step further by
making the shaded client the default client (o.a.hbase:hbase-client)
artifact and internally consume an hbase-client-unshaded. Turns the whole
thing on it's head in a way that's better for the naive user.

For MR/Spark/etc connectors: We're probably stuck as it is until necessary
classes can be extracted from hbase-server. I haven't looked into this
lately, so I hesitate to give a prescription.

For coprocessors: They forfeit their right to 3rd party library dependency
stability by entering our process space. Maybe in 3.0 or 4.0 we can rebuild
on jigsaw or OSGi, but for today I think the best we should do is provide
relatively stable internal APIs. I also find it unlikely that we'd want to
spend loads of cycles optimizing for this usecase. There's other, bigger
fish, IMHO.

For size/compile time: I think these ultimately matter less than user
experience. Let's find a solution that sucks less for downstreamers and
work backward on reducing bloat.

On the point of leaning heavily on Guava: their pace is traditionally too
fast for us to expose in any public API. Maybe that's changing, in which
case we could reconsider for 3.0. Better to start using the new API's
available in Java 8...

Thanks for taking this up, Stack.
-n

On Tue, Feb 7, 2017 at 12:22 PM Stack  wrote:


Here's an old thorny issue that won't go away. I'd like to hear what folks
are thinking these times.

My immediate need is that I want to upgrade Guava [1]. I want to move us to
guava 21.0, the latest release [2]. We currently depend on guava 12.0.
Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We could
just do it in an hbase-2.0.0, a major version release, but then
downstreamers and coprocessors that may have been a little lazy and that
have transitively come to depend on our versions of libs will break [3].
Then there is the murky area around the running of YARN/MR/Spark jobs where
the ordering of libs on the CLASSPATH gets interesting where fat-jaring or
command-line antics can get you over (most) problems if you persevere.

Multiply the above by netty, jackson, and a few other favorites.

Our proffered solution to the above is the shaded hbase artifact project;
have applications and tasks refer to the shaded hbase client instead.
Because we've not done the work to narrow the surface area we expose to
downstreamers, most consumers of our API -- certainly in a spark/MR context
since our MR utility is buried in hbase-server module still -- need both
the shaded hbase client and server on their CLASSPATH (i.e. near all of
hbase).

Leaving aside for the moment that our shaded client and server need
untangling, getting folks up on the shaded artifacts takes effort
evangelizing. We also need to be doing work to make sure our shading
doesn't leak dependencies, that it works for all deploy scenarios, and that
this route forward is well doc'd, and so on.

I don't see much evidence of our pushing the shaded artifacts route nor of
their being used. What is the perception of others?

I played with adding a new module to host shaded 3rd party libs[4]. The
downsides are a couple; would have to internally, refer to the offset
version of the lib and we bulk up our tarball by a bunch of megs (Build
gets a few seconds longer, not much). Upside is that we can float over a
variety of hadoop/spark versions using whatever guava or netty we want;
downstreamers and general users should have an easier time of it too
because they'll be less likely to run into library clashes. is this project
worth finishing?

WDYT?
St.Ack

1. I wanted to make use of the protobuf to-json tool. It is in the
extra-jar, protobuf-util. It requires a guava 16.0.
2. Guava is a quality lib that should be at the core of all our dev but we
are gun shy around using it because it semver's with gusto at a rate that
is orders of magnitude in advance of the Hadoop/HBase cadence.
3. We are trying to minimize breakage when we go to hbase-2.0.0.
4. HBASE-15749 suggested this but was shutdown because it made no case for
why we'd want to do it.





Re: [DISCUSSION] Upgrading core dependencies

2017-02-08 Thread Jerry He
Yeah.  Talking about the dependency, the hbase-spark module already has
dependency on hbase-server (coming from the spark bulk load producing
hfiles).
This is not very good. We have to be careful not entangling it more.
Also, there is already problem running the hbase-spark due to dependency
conflict, and one has to be careful about the order of the classpath to
make it work.

Jerry


Re: [DISCUSSION] Upgrading core dependencies

2017-02-07 Thread Stack
On Tue, Feb 7, 2017 at 7:48 PM, Ted Yu  wrote:

> bq. Better to start using the new API's available in Java 8
>
> +1 to the above.
> If no new Guava construct is introduced and we replace current Guava usage
> with Java 8 counterpart(s), in the future we can get rid of Guava
> dependency.
>
>
I don't follow. JDK8 is not a superset of Guava lib (or of netty, jackson,
etc.)
S





> On Tue, Feb 7, 2017 at 6:50 PM, Nick Dimiduk  wrote:
>
> > For the client: I'm a fan of shaded client modules by default and
> > minimizing the exposure of that surface area of 3rd party libs (none, if
> > possible). For example, Elastic Search has a similar set of challenges,
> the
> > solve it by advocating users shade from step 1. It's addressed first
> thing
> > in the docs for their client libs. We could take it a step further by
> > making the shaded client the default client (o.a.hbase:hbase-client)
> > artifact and internally consume an hbase-client-unshaded. Turns the whole
> > thing on it's head in a way that's better for the naive user.
> >
> > For MR/Spark/etc connectors: We're probably stuck as it is until
> necessary
> > classes can be extracted from hbase-server. I haven't looked into this
> > lately, so I hesitate to give a prescription.
> >
> > For coprocessors: They forfeit their right to 3rd party library
> dependency
> > stability by entering our process space. Maybe in 3.0 or 4.0 we can
> rebuild
> > on jigsaw or OSGi, but for today I think the best we should do is provide
> > relatively stable internal APIs. I also find it unlikely that we'd want
> to
> > spend loads of cycles optimizing for this usecase. There's other, bigger
> > fish, IMHO.
> >
> > For size/compile time: I think these ultimately matter less than user
> > experience. Let's find a solution that sucks less for downstreamers and
> > work backward on reducing bloat.
> >
> > On the point of leaning heavily on Guava: their pace is traditionally too
> > fast for us to expose in any public API. Maybe that's changing, in which
> > case we could reconsider for 3.0. Better to start using the new API's
> > available in Java 8...
> >
> > Thanks for taking this up, Stack.
> > -n
> >
> > On Tue, Feb 7, 2017 at 12:22 PM Stack  wrote:
> >
> > > Here's an old thorny issue that won't go away. I'd like to hear what
> > folks
> > > are thinking these times.
> > >
> > > My immediate need is that I want to upgrade Guava [1]. I want to move
> us
> > to
> > > guava 21.0, the latest release [2]. We currently depend on guava 12.0.
> > > Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We
> > could
> > > just do it in an hbase-2.0.0, a major version release, but then
> > > downstreamers and coprocessors that may have been a little lazy and
> that
> > > have transitively come to depend on our versions of libs will break
> [3].
> > > Then there is the murky area around the running of YARN/MR/Spark jobs
> > where
> > > the ordering of libs on the CLASSPATH gets interesting where fat-jaring
> > or
> > > command-line antics can get you over (most) problems if you persevere.
> > >
> > > Multiply the above by netty, jackson, and a few other favorites.
> > >
> > > Our proffered solution to the above is the shaded hbase artifact
> project;
> > > have applications and tasks refer to the shaded hbase client instead.
> > > Because we've not done the work to narrow the surface area we expose to
> > > downstreamers, most consumers of our API -- certainly in a spark/MR
> > context
> > > since our MR utility is buried in hbase-server module still -- need
> both
> > > the shaded hbase client and server on their CLASSPATH (i.e. near all of
> > > hbase).
> > >
> > > Leaving aside for the moment that our shaded client and server need
> > > untangling, getting folks up on the shaded artifacts takes effort
> > > evangelizing. We also need to be doing work to make sure our shading
> > > doesn't leak dependencies, that it works for all deploy scenarios, and
> > that
> > > this route forward is well doc'd, and so on.
> > >
> > > I don't see much evidence of our pushing the shaded artifacts route nor
> > of
> > > their being used. What is the perception of others?
> > >
> > > I played with adding a new module to host shaded 3rd party libs[4]. The
> > > downsides are a couple; would have to internally, refer to the offset
> > > version of the lib and we bulk up our tarball by a bunch of megs (Build
> > > gets a few seconds longer, not much). Upside is that we can float over
> a
> > > variety of hadoop/spark versions using whatever guava or netty we want;
> > > downstreamers and general users should have an easier time of it too
> > > because they'll be less likely to run into library clashes. is this
> > project
> > > worth finishing?
> > >
> > > WDYT?
> > > St.Ack
> > >
> > > 1. I wanted to make use of the protobuf to-json tool. It is in the
> > > extra-jar, protobuf-util. It requires a guava 16.0.
> > > 2. Guava is a quality lib that should be at the core of all our dev but
>

Re: [DISCUSSION] Upgrading core dependencies

2017-02-07 Thread Stack
Thanks Nick and Duo.

See below.

On Tue, Feb 7, 2017 at 6:50 PM, Nick Dimiduk  wrote:

> For the client: I'm a fan of shaded client modules by default and
> minimizing the exposure of that surface area of 3rd party libs (none, if
> possible). For example, Elastic Search has a similar set of challenges, the
> solve it by advocating users shade from step 1. It's addressed first thing
> in the docs for their client libs. We could take it a step further by
> making the shaded client the default client (o.a.hbase:hbase-client)
> artifact and internally consume an hbase-client-unshaded. Turns the whole
> thing on it's head in a way that's better for the naive user.
>
>
I like this idea. Let me try it out. Our shaded thingies are not 'air
tight' enough yet I suspect but maybe we can fix this. Making it so clients
don't have to include hbase-server too will be a little harder (will try
flipping this too so it always shaded by default).


> For MR/Spark/etc connectors: We're probably stuck as it is until necessary
> classes can be extracted from hbase-server. I haven't looked into this
> lately, so I hesitate to give a prescription.
>
>
This was last attempt and the contributor did a good job at sizing the
effort: HBASE-11843.


> For coprocessors: They forfeit their right to 3rd party library dependency
> stability by entering our process space. Maybe in 3.0 or 4.0 we can rebuild
> on jigsaw or OSGi, but for today I think the best we should do is provide
> relatively stable internal APIs. I also find it unlikely that we'd want to
> spend loads of cycles optimizing for this usecase. There's other, bigger
> fish, IMHO.
>
>
Agree.


> For size/compile time: I think these ultimately matter less than user
> experience. Let's find a solution that sucks less for downstreamers and
> work backward on reducing bloat.
>
> I like how you put it.


> On the point of leaning heavily on Guava: their pace is traditionally too
> fast for us to expose in any public API. Maybe that's changing, in which
> case we could reconsider for 3.0. Better to start using the new API's
> available in Java 8...
>
>
I like what Duo says here that we just not expose these libs in our API.

Yeah, we can do jdk8 new APIs but guava is something else (there is some
small overlap in functional idioms -- we can favor jdk8 here -- but guava
has a bunch more it'd be good to make use of).

Anyways, I was using Guava as illustration of a larger issue.

Thanks again for the input you two,
S






> Thanks for taking this up, Stack.
> -n
>
> On Tue, Feb 7, 2017 at 12:22 PM Stack  wrote:
>
> > Here's an old thorny issue that won't go away. I'd like to hear what
> folks
> > are thinking these times.
> >
> > My immediate need is that I want to upgrade Guava [1]. I want to move us
> to
> > guava 21.0, the latest release [2]. We currently depend on guava 12.0.
> > Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We
> could
> > just do it in an hbase-2.0.0, a major version release, but then
> > downstreamers and coprocessors that may have been a little lazy and that
> > have transitively come to depend on our versions of libs will break [3].
> > Then there is the murky area around the running of YARN/MR/Spark jobs
> where
> > the ordering of libs on the CLASSPATH gets interesting where fat-jaring
> or
> > command-line antics can get you over (most) problems if you persevere.
> >
> > Multiply the above by netty, jackson, and a few other favorites.
> >
> > Our proffered solution to the above is the shaded hbase artifact project;
> > have applications and tasks refer to the shaded hbase client instead.
> > Because we've not done the work to narrow the surface area we expose to
> > downstreamers, most consumers of our API -- certainly in a spark/MR
> context
> > since our MR utility is buried in hbase-server module still -- need both
> > the shaded hbase client and server on their CLASSPATH (i.e. near all of
> > hbase).
> >
> > Leaving aside for the moment that our shaded client and server need
> > untangling, getting folks up on the shaded artifacts takes effort
> > evangelizing. We also need to be doing work to make sure our shading
> > doesn't leak dependencies, that it works for all deploy scenarios, and
> that
> > this route forward is well doc'd, and so on.
> >
> > I don't see much evidence of our pushing the shaded artifacts route nor
> of
> > their being used. What is the perception of others?
> >
> > I played with adding a new module to host shaded 3rd party libs[4]. The
> > downsides are a couple; would have to internally, refer to the offset
> > version of the lib and we bulk up our tarball by a bunch of megs (Build
> > gets a few seconds longer, not much). Upside is that we can float over a
> > variety of hadoop/spark versions using whatever guava or netty we want;
> > downstreamers and general users should have an easier time of it too
> > because they'll be less likely to run into library clashes. is this
> project
> > worth finishing?
> 

Re: [DISCUSSION] Upgrading core dependencies

2017-02-07 Thread Duo Zhang
For coprocessors, our compatibility matrix says that we can break
everything, so I think dependency is not the first thing they need to
consider when upgrading between major versions?

2017-02-08 11:48 GMT+08:00 Ted Yu :

> bq. Better to start using the new API's available in Java 8
>
> +1 to the above.
> If no new Guava construct is introduced and we replace current Guava usage
> with Java 8 counterpart(s), in the future we can get rid of Guava
> dependency.
>
> On Tue, Feb 7, 2017 at 6:50 PM, Nick Dimiduk  wrote:
>
> > For the client: I'm a fan of shaded client modules by default and
> > minimizing the exposure of that surface area of 3rd party libs (none, if
> > possible). For example, Elastic Search has a similar set of challenges,
> the
> > solve it by advocating users shade from step 1. It's addressed first
> thing
> > in the docs for their client libs. We could take it a step further by
> > making the shaded client the default client (o.a.hbase:hbase-client)
> > artifact and internally consume an hbase-client-unshaded. Turns the whole
> > thing on it's head in a way that's better for the naive user.
> >
> > For MR/Spark/etc connectors: We're probably stuck as it is until
> necessary
> > classes can be extracted from hbase-server. I haven't looked into this
> > lately, so I hesitate to give a prescription.
> >
> > For coprocessors: They forfeit their right to 3rd party library
> dependency
> > stability by entering our process space. Maybe in 3.0 or 4.0 we can
> rebuild
> > on jigsaw or OSGi, but for today I think the best we should do is provide
> > relatively stable internal APIs. I also find it unlikely that we'd want
> to
> > spend loads of cycles optimizing for this usecase. There's other, bigger
> > fish, IMHO.
> >
> > For size/compile time: I think these ultimately matter less than user
> > experience. Let's find a solution that sucks less for downstreamers and
> > work backward on reducing bloat.
> >
> > On the point of leaning heavily on Guava: their pace is traditionally too
> > fast for us to expose in any public API. Maybe that's changing, in which
> > case we could reconsider for 3.0. Better to start using the new API's
> > available in Java 8...
> >
> > Thanks for taking this up, Stack.
> > -n
> >
> > On Tue, Feb 7, 2017 at 12:22 PM Stack  wrote:
> >
> > > Here's an old thorny issue that won't go away. I'd like to hear what
> > folks
> > > are thinking these times.
> > >
> > > My immediate need is that I want to upgrade Guava [1]. I want to move
> us
> > to
> > > guava 21.0, the latest release [2]. We currently depend on guava 12.0.
> > > Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We
> > could
> > > just do it in an hbase-2.0.0, a major version release, but then
> > > downstreamers and coprocessors that may have been a little lazy and
> that
> > > have transitively come to depend on our versions of libs will break
> [3].
> > > Then there is the murky area around the running of YARN/MR/Spark jobs
> > where
> > > the ordering of libs on the CLASSPATH gets interesting where fat-jaring
> > or
> > > command-line antics can get you over (most) problems if you persevere.
> > >
> > > Multiply the above by netty, jackson, and a few other favorites.
> > >
> > > Our proffered solution to the above is the shaded hbase artifact
> project;
> > > have applications and tasks refer to the shaded hbase client instead.
> > > Because we've not done the work to narrow the surface area we expose to
> > > downstreamers, most consumers of our API -- certainly in a spark/MR
> > context
> > > since our MR utility is buried in hbase-server module still -- need
> both
> > > the shaded hbase client and server on their CLASSPATH (i.e. near all of
> > > hbase).
> > >
> > > Leaving aside for the moment that our shaded client and server need
> > > untangling, getting folks up on the shaded artifacts takes effort
> > > evangelizing. We also need to be doing work to make sure our shading
> > > doesn't leak dependencies, that it works for all deploy scenarios, and
> > that
> > > this route forward is well doc'd, and so on.
> > >
> > > I don't see much evidence of our pushing the shaded artifacts route nor
> > of
> > > their being used. What is the perception of others?
> > >
> > > I played with adding a new module to host shaded 3rd party libs[4]. The
> > > downsides are a couple; would have to internally, refer to the offset
> > > version of the lib and we bulk up our tarball by a bunch of megs (Build
> > > gets a few seconds longer, not much). Upside is that we can float over
> a
> > > variety of hadoop/spark versions using whatever guava or netty we want;
> > > downstreamers and general users should have an easier time of it too
> > > because they'll be less likely to run into library clashes. is this
> > project
> > > worth finishing?
> > >
> > > WDYT?
> > > St.Ack
> > >
> > > 1. I wanted to make use of the protobuf to-json tool. It is in the
> > > extra-jar, protobuf-util. It requires a guava 1

Re: [DISCUSSION] Upgrading core dependencies

2017-02-07 Thread Ted Yu
bq. Better to start using the new API's available in Java 8

+1 to the above.
If no new Guava construct is introduced and we replace current Guava usage
with Java 8 counterpart(s), in the future we can get rid of Guava
dependency.

On Tue, Feb 7, 2017 at 6:50 PM, Nick Dimiduk  wrote:

> For the client: I'm a fan of shaded client modules by default and
> minimizing the exposure of that surface area of 3rd party libs (none, if
> possible). For example, Elastic Search has a similar set of challenges, the
> solve it by advocating users shade from step 1. It's addressed first thing
> in the docs for their client libs. We could take it a step further by
> making the shaded client the default client (o.a.hbase:hbase-client)
> artifact and internally consume an hbase-client-unshaded. Turns the whole
> thing on it's head in a way that's better for the naive user.
>
> For MR/Spark/etc connectors: We're probably stuck as it is until necessary
> classes can be extracted from hbase-server. I haven't looked into this
> lately, so I hesitate to give a prescription.
>
> For coprocessors: They forfeit their right to 3rd party library dependency
> stability by entering our process space. Maybe in 3.0 or 4.0 we can rebuild
> on jigsaw or OSGi, but for today I think the best we should do is provide
> relatively stable internal APIs. I also find it unlikely that we'd want to
> spend loads of cycles optimizing for this usecase. There's other, bigger
> fish, IMHO.
>
> For size/compile time: I think these ultimately matter less than user
> experience. Let's find a solution that sucks less for downstreamers and
> work backward on reducing bloat.
>
> On the point of leaning heavily on Guava: their pace is traditionally too
> fast for us to expose in any public API. Maybe that's changing, in which
> case we could reconsider for 3.0. Better to start using the new API's
> available in Java 8...
>
> Thanks for taking this up, Stack.
> -n
>
> On Tue, Feb 7, 2017 at 12:22 PM Stack  wrote:
>
> > Here's an old thorny issue that won't go away. I'd like to hear what
> folks
> > are thinking these times.
> >
> > My immediate need is that I want to upgrade Guava [1]. I want to move us
> to
> > guava 21.0, the latest release [2]. We currently depend on guava 12.0.
> > Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We
> could
> > just do it in an hbase-2.0.0, a major version release, but then
> > downstreamers and coprocessors that may have been a little lazy and that
> > have transitively come to depend on our versions of libs will break [3].
> > Then there is the murky area around the running of YARN/MR/Spark jobs
> where
> > the ordering of libs on the CLASSPATH gets interesting where fat-jaring
> or
> > command-line antics can get you over (most) problems if you persevere.
> >
> > Multiply the above by netty, jackson, and a few other favorites.
> >
> > Our proffered solution to the above is the shaded hbase artifact project;
> > have applications and tasks refer to the shaded hbase client instead.
> > Because we've not done the work to narrow the surface area we expose to
> > downstreamers, most consumers of our API -- certainly in a spark/MR
> context
> > since our MR utility is buried in hbase-server module still -- need both
> > the shaded hbase client and server on their CLASSPATH (i.e. near all of
> > hbase).
> >
> > Leaving aside for the moment that our shaded client and server need
> > untangling, getting folks up on the shaded artifacts takes effort
> > evangelizing. We also need to be doing work to make sure our shading
> > doesn't leak dependencies, that it works for all deploy scenarios, and
> that
> > this route forward is well doc'd, and so on.
> >
> > I don't see much evidence of our pushing the shaded artifacts route nor
> of
> > their being used. What is the perception of others?
> >
> > I played with adding a new module to host shaded 3rd party libs[4]. The
> > downsides are a couple; would have to internally, refer to the offset
> > version of the lib and we bulk up our tarball by a bunch of megs (Build
> > gets a few seconds longer, not much). Upside is that we can float over a
> > variety of hadoop/spark versions using whatever guava or netty we want;
> > downstreamers and general users should have an easier time of it too
> > because they'll be less likely to run into library clashes. is this
> project
> > worth finishing?
> >
> > WDYT?
> > St.Ack
> >
> > 1. I wanted to make use of the protobuf to-json tool. It is in the
> > extra-jar, protobuf-util. It requires a guava 16.0.
> > 2. Guava is a quality lib that should be at the core of all our dev but
> we
> > are gun shy around using it because it semver's with gusto at a rate that
> > is orders of magnitude in advance of the Hadoop/HBase cadence.
> > 3. We are trying to minimize breakage when we go to hbase-2.0.0.
> > 4. HBASE-15749 suggested this but was shutdown because it made no case
> for
> > why we'd want to do it.
> >
>


Re: [DISCUSSION] Upgrading core dependencies

2017-02-07 Thread Nick Dimiduk
For the client: I'm a fan of shaded client modules by default and
minimizing the exposure of that surface area of 3rd party libs (none, if
possible). For example, Elastic Search has a similar set of challenges, the
solve it by advocating users shade from step 1. It's addressed first thing
in the docs for their client libs. We could take it a step further by
making the shaded client the default client (o.a.hbase:hbase-client)
artifact and internally consume an hbase-client-unshaded. Turns the whole
thing on it's head in a way that's better for the naive user.

For MR/Spark/etc connectors: We're probably stuck as it is until necessary
classes can be extracted from hbase-server. I haven't looked into this
lately, so I hesitate to give a prescription.

For coprocessors: They forfeit their right to 3rd party library dependency
stability by entering our process space. Maybe in 3.0 or 4.0 we can rebuild
on jigsaw or OSGi, but for today I think the best we should do is provide
relatively stable internal APIs. I also find it unlikely that we'd want to
spend loads of cycles optimizing for this usecase. There's other, bigger
fish, IMHO.

For size/compile time: I think these ultimately matter less than user
experience. Let's find a solution that sucks less for downstreamers and
work backward on reducing bloat.

On the point of leaning heavily on Guava: their pace is traditionally too
fast for us to expose in any public API. Maybe that's changing, in which
case we could reconsider for 3.0. Better to start using the new API's
available in Java 8...

Thanks for taking this up, Stack.
-n

On Tue, Feb 7, 2017 at 12:22 PM Stack  wrote:

> Here's an old thorny issue that won't go away. I'd like to hear what folks
> are thinking these times.
>
> My immediate need is that I want to upgrade Guava [1]. I want to move us to
> guava 21.0, the latest release [2]. We currently depend on guava 12.0.
> Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We could
> just do it in an hbase-2.0.0, a major version release, but then
> downstreamers and coprocessors that may have been a little lazy and that
> have transitively come to depend on our versions of libs will break [3].
> Then there is the murky area around the running of YARN/MR/Spark jobs where
> the ordering of libs on the CLASSPATH gets interesting where fat-jaring or
> command-line antics can get you over (most) problems if you persevere.
>
> Multiply the above by netty, jackson, and a few other favorites.
>
> Our proffered solution to the above is the shaded hbase artifact project;
> have applications and tasks refer to the shaded hbase client instead.
> Because we've not done the work to narrow the surface area we expose to
> downstreamers, most consumers of our API -- certainly in a spark/MR context
> since our MR utility is buried in hbase-server module still -- need both
> the shaded hbase client and server on their CLASSPATH (i.e. near all of
> hbase).
>
> Leaving aside for the moment that our shaded client and server need
> untangling, getting folks up on the shaded artifacts takes effort
> evangelizing. We also need to be doing work to make sure our shading
> doesn't leak dependencies, that it works for all deploy scenarios, and that
> this route forward is well doc'd, and so on.
>
> I don't see much evidence of our pushing the shaded artifacts route nor of
> their being used. What is the perception of others?
>
> I played with adding a new module to host shaded 3rd party libs[4]. The
> downsides are a couple; would have to internally, refer to the offset
> version of the lib and we bulk up our tarball by a bunch of megs (Build
> gets a few seconds longer, not much). Upside is that we can float over a
> variety of hadoop/spark versions using whatever guava or netty we want;
> downstreamers and general users should have an easier time of it too
> because they'll be less likely to run into library clashes. is this project
> worth finishing?
>
> WDYT?
> St.Ack
>
> 1. I wanted to make use of the protobuf to-json tool. It is in the
> extra-jar, protobuf-util. It requires a guava 16.0.
> 2. Guava is a quality lib that should be at the core of all our dev but we
> are gun shy around using it because it semver's with gusto at a rate that
> is orders of magnitude in advance of the Hadoop/HBase cadence.
> 3. We are trying to minimize breakage when we go to hbase-2.0.0.
> 4. HBASE-15749 suggested this but was shutdown because it made no case for
> why we'd want to do it.
>


Re: [DISCUSSION] Upgrading core dependencies

2017-02-07 Thread Duo Zhang
I think we can upgrade these bad guys every time when major version changes
and keep the version unchanged during the whole major version,

For the shading things, upgrading between major versions is always painful,
so I think shaded client and shaded server is enough for users? Maybe the
current shaded client and server are not good enough but I think the
approach is acceptable.

And one more thing is that, we need to make sure that we do not leak any
classes from these dependencies out in our public API.

Thanks.

2017-02-08 4:21 GMT+08:00 Stack :

> Here's an old thorny issue that won't go away. I'd like to hear what folks
> are thinking these times.
>
> My immediate need is that I want to upgrade Guava [1]. I want to move us to
> guava 21.0, the latest release [2]. We currently depend on guava 12.0.
> Hadoop's guava -- 11.0 -- is also on our CLASSPATH (three times). We could
> just do it in an hbase-2.0.0, a major version release, but then
> downstreamers and coprocessors that may have been a little lazy and that
> have transitively come to depend on our versions of libs will break [3].
> Then there is the murky area around the running of YARN/MR/Spark jobs where
> the ordering of libs on the CLASSPATH gets interesting where fat-jaring or
> command-line antics can get you over (most) problems if you persevere.
>
> Multiply the above by netty, jackson, and a few other favorites.
>
> Our proffered solution to the above is the shaded hbase artifact project;
> have applications and tasks refer to the shaded hbase client instead.
> Because we've not done the work to narrow the surface area we expose to
> downstreamers, most consumers of our API -- certainly in a spark/MR context
> since our MR utility is buried in hbase-server module still -- need both
> the shaded hbase client and server on their CLASSPATH (i.e. near all of
> hbase).
>
> Leaving aside for the moment that our shaded client and server need
> untangling, getting folks up on the shaded artifacts takes effort
> evangelizing. We also need to be doing work to make sure our shading
> doesn't leak dependencies, that it works for all deploy scenarios, and that
> this route forward is well doc'd, and so on.
>
> I don't see much evidence of our pushing the shaded artifacts route nor of
> their being used. What is the perception of others?
>
> I played with adding a new module to host shaded 3rd party libs[4]. The
> downsides are a couple; would have to internally, refer to the offset
> version of the lib and we bulk up our tarball by a bunch of megs (Build
> gets a few seconds longer, not much). Upside is that we can float over a
> variety of hadoop/spark versions using whatever guava or netty we want;
> downstreamers and general users should have an easier time of it too
> because they'll be less likely to run into library clashes. is this project
> worth finishing?
>
> WDYT?
> St.Ack
>
> 1. I wanted to make use of the protobuf to-json tool. It is in the
> extra-jar, protobuf-util. It requires a guava 16.0.
> 2. Guava is a quality lib that should be at the core of all our dev but we
> are gun shy around using it because it semver's with gusto at a rate that
> is orders of magnitude in advance of the Hadoop/HBase cadence.
> 3. We are trying to minimize breakage when we go to hbase-2.0.0.
> 4. HBASE-15749 suggested this but was shutdown because it made no case for
> why we'd want to do it.
>