Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Romain Manni-Bucau
well we can disagree on the code - it is fine ;), but the needed part of it
by beam is not huge and in any case it can be forked without requiring 10
classes - if so we'll use another impl than the guava one ;). This is the
whole point.


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-02-02 17:24 GMT+01:00 Reuven Lax :

> TypeToken is not trivial. I've written code to do what TypeToken does
> before (figuring out generic ancestor types). It's actually somewhat
> tricky, and the code I wrote had subtle bugs in it; eventually we removed
> this code in favor of Guava's implementation :)
>
> On Fri, Feb 2, 2018 at 7:47 AM, Romain Manni-Bucau 
> wrote:
>
>> Yep, note I never said to reinvent the wheel, we can copy it from guava,
>> openwebbeans or any other impl. Point was more to avoid to depend on
>> something we don't own for that which is after all not that much code. I
>> also think we can limit it a lot to align it on what is supported by beam
>> (I'm thinking to coders here) but this can be another topic.
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-02-02 16:33 GMT+01:00 Kenneth Knowles :
>>
>>> On Fri, Feb 2, 2018 at 7:18 AM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Don't forget beam doesn't support much behind it (mainly only a few
 ParameterizedType due the usage code path) so it is mainly only about
 handling parameterized types and typevariables recursively. Not a lot of
 work. Main concern being it is in the API so using a shade as an API is
 never a good idea, in particular cause the shade can be broken and requires
 to setup clirr or things like that and when it breaks you are done and need
 to fork it anyway. Limiting the dependencies for an API - as beam is - is
 always saner even if it requires a small fork of code.

>>>
>>> The main thing that TypeToken is used for is capturing generics that are
>>> lost by Java reflection. It is a bit tricky, actually.
>>>
>>> Kenn
>>>
>>>
>>>

 Romain Manni-Bucau
 @rmannibucau  |  Blog
  | Old Blog
  | Github
  | LinkedIn
  | Book
 

 2018-02-02 15:49 GMT+01:00 Kenneth Knowles :

> On Fri, Feb 2, 2018 at 6:41 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>>
>>
>> 2018-02-02 15:37 GMT+01:00 Kenneth Knowles :
>>
>>> Another couple:
>>>
>>>  - User-facing TypeDescriptor is a very thin wrapper on Guava's
>>> TypeToken
>>>
>>
>> Technically reflect Type is enough
>>
>
> If you want to try to remove TypeToken from underneath TypeDescriptor,
> I have no objections as long as you expand the test suite to cover all the
> functionality where we trust TypeToken's tests. Good luck :-)
>
> Kenn
>
>
>>
>>
>>>  - ImmutableList and friends and their builders are very widely used
>>> and IMO still add a lot for readability and preventing someone coming 
>>> along
>>> and adding mistakes to a codebase
>>>
>>
>> Sugar but not required. When you compare the cost of a shade versus
>> of duplicating the parts we need there is no real match IMHO.
>>
>>
>>>
>>> So considering it all, I would keep a vendored Guava (but also move
>>> off where we can, and also have our own improvements). I hope it will 
>>> be a
>>> near-empty build file to generate it so not a maintenance burden.
>>>
>>> Kenn
>>>
>>> On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles 
>>> wrote:
>>>
 Nice. This sounds like a great idea (or two?) and goes along with
 what I just started for futures.

 Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and
 assigned to Ismaël :-) and converted my futures thing to a subtask.

 Specific things for our micro guava:

  - checkArgumentNotNull can throw the right exception
  - our own Optional because Java's is not Serializable

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Reuven Lax
TypeToken is not trivial. I've written code to do what TypeToken does
before (figuring out generic ancestor types). It's actually somewhat
tricky, and the code I wrote had subtle bugs in it; eventually we removed
this code in favor of Guava's implementation :)

On Fri, Feb 2, 2018 at 7:47 AM, Romain Manni-Bucau 
wrote:

> Yep, note I never said to reinvent the wheel, we can copy it from guava,
> openwebbeans or any other impl. Point was more to avoid to depend on
> something we don't own for that which is after all not that much code. I
> also think we can limit it a lot to align it on what is supported by beam
> (I'm thinking to coders here) but this can be another topic.
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-02-02 16:33 GMT+01:00 Kenneth Knowles :
>
>> On Fri, Feb 2, 2018 at 7:18 AM, Romain Manni-Bucau > > wrote:
>>
>>> Don't forget beam doesn't support much behind it (mainly only a few
>>> ParameterizedType due the usage code path) so it is mainly only about
>>> handling parameterized types and typevariables recursively. Not a lot of
>>> work. Main concern being it is in the API so using a shade as an API is
>>> never a good idea, in particular cause the shade can be broken and requires
>>> to setup clirr or things like that and when it breaks you are done and need
>>> to fork it anyway. Limiting the dependencies for an API - as beam is - is
>>> always saner even if it requires a small fork of code.
>>>
>>
>> The main thing that TypeToken is used for is capturing generics that are
>> lost by Java reflection. It is a bit tricky, actually.
>>
>> Kenn
>>
>>
>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>>  | LinkedIn
>>>  | Book
>>> 
>>>
>>> 2018-02-02 15:49 GMT+01:00 Kenneth Knowles :
>>>
 On Fri, Feb 2, 2018 at 6:41 AM, Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

>
>
> 2018-02-02 15:37 GMT+01:00 Kenneth Knowles :
>
>> Another couple:
>>
>>  - User-facing TypeDescriptor is a very thin wrapper on Guava's
>> TypeToken
>>
>
> Technically reflect Type is enough
>

 If you want to try to remove TypeToken from underneath TypeDescriptor,
 I have no objections as long as you expand the test suite to cover all the
 functionality where we trust TypeToken's tests. Good luck :-)

 Kenn


>
>
>>  - ImmutableList and friends and their builders are very widely used
>> and IMO still add a lot for readability and preventing someone coming 
>> along
>> and adding mistakes to a codebase
>>
>
> Sugar but not required. When you compare the cost of a shade versus of
> duplicating the parts we need there is no real match IMHO.
>
>
>>
>> So considering it all, I would keep a vendored Guava (but also move
>> off where we can, and also have our own improvements). I hope it will be 
>> a
>> near-empty build file to generate it so not a maintenance burden.
>>
>> Kenn
>>
>> On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles 
>> wrote:
>>
>>> Nice. This sounds like a great idea (or two?) and goes along with
>>> what I just started for futures.
>>>
>>> Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and
>>> assigned to Ismaël :-) and converted my futures thing to a subtask.
>>>
>>> Specific things for our micro guava:
>>>
>>>  - checkArgumentNotNull can throw the right exception
>>>  - our own Optional because Java's is not Serializable
>>>  - futures combinators since many are missing, especially Java's
>>> don't do exceptions right
>>>
>>> Protobuf: didn't file an issue because I'm not sure
>>>
>>> I was wondering if pre-shading works. We really need to get rid of
>>> it from public APIs in a 100% reliable way. It is also a problem for
>>> Dataflow. I was wondering if one approach is to pre-shade
>>> gcpio-protobuf-java, gcpio-protobuf-java-util, gcpio-grpc-java, etc.
>>> Anything that needs to take a Message object. (and do the same for
>>> beam-model-protobuf-java since the model bits have to depend on each 
>>> other
>>> but nothing else).
>>>
>>> Kenn
>>>
>>> On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Romain Manni-Bucau
Yep, note I never said to reinvent the wheel, we can copy it from guava,
openwebbeans or any other impl. Point was more to avoid to depend on
something we don't own for that which is after all not that much code. I
also think we can limit it a lot to align it on what is supported by beam
(I'm thinking to coders here) but this can be another topic.


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-02-02 16:33 GMT+01:00 Kenneth Knowles :

> On Fri, Feb 2, 2018 at 7:18 AM, Romain Manni-Bucau 
> wrote:
>
>> Don't forget beam doesn't support much behind it (mainly only a few
>> ParameterizedType due the usage code path) so it is mainly only about
>> handling parameterized types and typevariables recursively. Not a lot of
>> work. Main concern being it is in the API so using a shade as an API is
>> never a good idea, in particular cause the shade can be broken and requires
>> to setup clirr or things like that and when it breaks you are done and need
>> to fork it anyway. Limiting the dependencies for an API - as beam is - is
>> always saner even if it requires a small fork of code.
>>
>
> The main thing that TypeToken is used for is capturing generics that are
> lost by Java reflection. It is a bit tricky, actually.
>
> Kenn
>
>
>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-02-02 15:49 GMT+01:00 Kenneth Knowles :
>>
>>> On Fri, Feb 2, 2018 at 6:41 AM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>


 2018-02-02 15:37 GMT+01:00 Kenneth Knowles :

> Another couple:
>
>  - User-facing TypeDescriptor is a very thin wrapper on Guava's
> TypeToken
>

 Technically reflect Type is enough

>>>
>>> If you want to try to remove TypeToken from underneath TypeDescriptor, I
>>> have no objections as long as you expand the test suite to cover all the
>>> functionality where we trust TypeToken's tests. Good luck :-)
>>>
>>> Kenn
>>>
>>>


>  - ImmutableList and friends and their builders are very widely used
> and IMO still add a lot for readability and preventing someone coming 
> along
> and adding mistakes to a codebase
>

 Sugar but not required. When you compare the cost of a shade versus of
 duplicating the parts we need there is no real match IMHO.


>
> So considering it all, I would keep a vendored Guava (but also move
> off where we can, and also have our own improvements). I hope it will be a
> near-empty build file to generate it so not a maintenance burden.
>
> Kenn
>
> On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles 
> wrote:
>
>> Nice. This sounds like a great idea (or two?) and goes along with
>> what I just started for futures.
>>
>> Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and
>> assigned to Ismaël :-) and converted my futures thing to a subtask.
>>
>> Specific things for our micro guava:
>>
>>  - checkArgumentNotNull can throw the right exception
>>  - our own Optional because Java's is not Serializable
>>  - futures combinators since many are missing, especially Java's
>> don't do exceptions right
>>
>> Protobuf: didn't file an issue because I'm not sure
>>
>> I was wondering if pre-shading works. We really need to get rid of it
>> from public APIs in a 100% reliable way. It is also a problem for 
>> Dataflow.
>> I was wondering if one approach is to pre-shade gcpio-protobuf-java,
>> gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to 
>> take
>> a Message object. (and do the same for beam-model-protobuf-java since the
>> model bits have to depend on each other but nothing else).
>>
>> Kenn
>>
>> On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía 
>> wrote:
>>
>>> Huge +1 to get rid of Guava!
>>>
>>> This solves annoying dependency issues for some IOs and allow us to
>>> get rid of the shading that makes current jars bigger than they
>>> should
>>> be.
>>>
>>> We can create our own 'micro guava' package with some classes for
>>> things that are hard to migrate, or that we  prefer to still have
>>> like
>>> the check* methods for example. Given the size of 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Kenneth Knowles
On Fri, Feb 2, 2018 at 7:18 AM, Romain Manni-Bucau 
wrote:

> Don't forget beam doesn't support much behind it (mainly only a few
> ParameterizedType due the usage code path) so it is mainly only about
> handling parameterized types and typevariables recursively. Not a lot of
> work. Main concern being it is in the API so using a shade as an API is
> never a good idea, in particular cause the shade can be broken and requires
> to setup clirr or things like that and when it breaks you are done and need
> to fork it anyway. Limiting the dependencies for an API - as beam is - is
> always saner even if it requires a small fork of code.
>

The main thing that TypeToken is used for is capturing generics that are
lost by Java reflection. It is a bit tricky, actually.

Kenn



>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-02-02 15:49 GMT+01:00 Kenneth Knowles :
>
>> On Fri, Feb 2, 2018 at 6:41 AM, Romain Manni-Bucau > > wrote:
>>
>>>
>>>
>>> 2018-02-02 15:37 GMT+01:00 Kenneth Knowles :
>>>
 Another couple:

  - User-facing TypeDescriptor is a very thin wrapper on Guava's
 TypeToken

>>>
>>> Technically reflect Type is enough
>>>
>>
>> If you want to try to remove TypeToken from underneath TypeDescriptor, I
>> have no objections as long as you expand the test suite to cover all the
>> functionality where we trust TypeToken's tests. Good luck :-)
>>
>> Kenn
>>
>>
>>>
>>>
  - ImmutableList and friends and their builders are very widely used
 and IMO still add a lot for readability and preventing someone coming along
 and adding mistakes to a codebase

>>>
>>> Sugar but not required. When you compare the cost of a shade versus of
>>> duplicating the parts we need there is no real match IMHO.
>>>
>>>

 So considering it all, I would keep a vendored Guava (but also move off
 where we can, and also have our own improvements). I hope it will be a
 near-empty build file to generate it so not a maintenance burden.

 Kenn

 On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles  wrote:

> Nice. This sounds like a great idea (or two?) and goes along with what
> I just started for futures.
>
> Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and
> assigned to Ismaël :-) and converted my futures thing to a subtask.
>
> Specific things for our micro guava:
>
>  - checkArgumentNotNull can throw the right exception
>  - our own Optional because Java's is not Serializable
>  - futures combinators since many are missing, especially Java's don't
> do exceptions right
>
> Protobuf: didn't file an issue because I'm not sure
>
> I was wondering if pre-shading works. We really need to get rid of it
> from public APIs in a 100% reliable way. It is also a problem for 
> Dataflow.
> I was wondering if one approach is to pre-shade gcpio-protobuf-java,
> gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to 
> take
> a Message object. (and do the same for beam-model-protobuf-java since the
> model bits have to depend on each other but nothing else).
>
> Kenn
>
> On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía 
> wrote:
>
>> Huge +1 to get rid of Guava!
>>
>> This solves annoying dependency issues for some IOs and allow us to
>> get rid of the shading that makes current jars bigger than they should
>> be.
>>
>> We can create our own 'micro guava' package with some classes for
>> things that are hard to migrate, or that we  prefer to still have like
>> the check* methods for example. Given the size of the task we should
>> probably divide it into subtasks, more important is to get rid of it
>> for 'sdks/java/core'. We can then attack other areas afterwards.
>>
>> Other important idea would be to get rid of Protobuf in public APIs
>> like GCPIO and to better shade it from leaking into the runners. An
>> unexpected side effect of this is a leak of netty via gRPC/protobuf
>> that is byting us for the Spark runner, but well that's worth a
>> different discussion.
>>
>>
>> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
>>  wrote:
>> > a map of list is fine and not a challenge we'll face long I hope ;)
>> >
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> >
>> > 2018-02-01 9:40 GMT+01:00 Reuven Lax :
>> >>
>> >> Not sure 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Romain Manni-Bucau
Don't forget beam doesn't support much behind it (mainly only a few
ParameterizedType due the usage code path) so it is mainly only about
handling parameterized types and typevariables recursively. Not a lot of
work. Main concern being it is in the API so using a shade as an API is
never a good idea, in particular cause the shade can be broken and requires
to setup clirr or things like that and when it breaks you are done and need
to fork it anyway. Limiting the dependencies for an API - as beam is - is
always saner even if it requires a small fork of code.


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-02-02 15:49 GMT+01:00 Kenneth Knowles :

> On Fri, Feb 2, 2018 at 6:41 AM, Romain Manni-Bucau 
> wrote:
>
>>
>>
>> 2018-02-02 15:37 GMT+01:00 Kenneth Knowles :
>>
>>> Another couple:
>>>
>>>  - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken
>>>
>>
>> Technically reflect Type is enough
>>
>
> If you want to try to remove TypeToken from underneath TypeDescriptor, I
> have no objections as long as you expand the test suite to cover all the
> functionality where we trust TypeToken's tests. Good luck :-)
>
> Kenn
>
>
>>
>>
>>>  - ImmutableList and friends and their builders are very widely used and
>>> IMO still add a lot for readability and preventing someone coming along and
>>> adding mistakes to a codebase
>>>
>>
>> Sugar but not required. When you compare the cost of a shade versus of
>> duplicating the parts we need there is no real match IMHO.
>>
>>
>>>
>>> So considering it all, I would keep a vendored Guava (but also move off
>>> where we can, and also have our own improvements). I hope it will be a
>>> near-empty build file to generate it so not a maintenance burden.
>>>
>>> Kenn
>>>
>>> On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles  wrote:
>>>
 Nice. This sounds like a great idea (or two?) and goes along with what
 I just started for futures.

 Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and
 assigned to Ismaël :-) and converted my futures thing to a subtask.

 Specific things for our micro guava:

  - checkArgumentNotNull can throw the right exception
  - our own Optional because Java's is not Serializable
  - futures combinators since many are missing, especially Java's don't
 do exceptions right

 Protobuf: didn't file an issue because I'm not sure

 I was wondering if pre-shading works. We really need to get rid of it
 from public APIs in a 100% reliable way. It is also a problem for Dataflow.
 I was wondering if one approach is to pre-shade gcpio-protobuf-java,
 gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take
 a Message object. (and do the same for beam-model-protobuf-java since the
 model bits have to depend on each other but nothing else).

 Kenn

 On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía  wrote:

> Huge +1 to get rid of Guava!
>
> This solves annoying dependency issues for some IOs and allow us to
> get rid of the shading that makes current jars bigger than they should
> be.
>
> We can create our own 'micro guava' package with some classes for
> things that are hard to migrate, or that we  prefer to still have like
> the check* methods for example. Given the size of the task we should
> probably divide it into subtasks, more important is to get rid of it
> for 'sdks/java/core'. We can then attack other areas afterwards.
>
> Other important idea would be to get rid of Protobuf in public APIs
> like GCPIO and to better shade it from leaking into the runners. An
> unexpected side effect of this is a leak of netty via gRPC/protobuf
> that is byting us for the Spark runner, but well that's worth a
> different discussion.
>
>
> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
>  wrote:
> > a map of list is fine and not a challenge we'll face long I hope ;)
> >
> >
> > Romain Manni-Bucau
> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >
> > 2018-02-01 9:40 GMT+01:00 Reuven Lax :
> >>
> >> Not sure we'll be able to replace them all. Things like guava Table
> and
> >> Multimap don't have great replacements in Java8.
> >>
> >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> >> wrote:
> >>>
> >>> +1, it was on my TODO for a while waiting the Java8 update.
> >>>
> >>> Regards
> >>> JB
> >>>
> 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Kenneth Knowles
On Fri, Feb 2, 2018 at 6:41 AM, Romain Manni-Bucau 
wrote:

>
>
> 2018-02-02 15:37 GMT+01:00 Kenneth Knowles :
>
>> Another couple:
>>
>>  - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken
>>
>
> Technically reflect Type is enough
>

If you want to try to remove TypeToken from underneath TypeDescriptor, I
have no objections as long as you expand the test suite to cover all the
functionality where we trust TypeToken's tests. Good luck :-)

Kenn


>
>
>>  - ImmutableList and friends and their builders are very widely used and
>> IMO still add a lot for readability and preventing someone coming along and
>> adding mistakes to a codebase
>>
>
> Sugar but not required. When you compare the cost of a shade versus of
> duplicating the parts we need there is no real match IMHO.
>
>
>>
>> So considering it all, I would keep a vendored Guava (but also move off
>> where we can, and also have our own improvements). I hope it will be a
>> near-empty build file to generate it so not a maintenance burden.
>>
>> Kenn
>>
>> On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles  wrote:
>>
>>> Nice. This sounds like a great idea (or two?) and goes along with what I
>>> just started for futures.
>>>
>>> Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and
>>> assigned to Ismaël :-) and converted my futures thing to a subtask.
>>>
>>> Specific things for our micro guava:
>>>
>>>  - checkArgumentNotNull can throw the right exception
>>>  - our own Optional because Java's is not Serializable
>>>  - futures combinators since many are missing, especially Java's don't
>>> do exceptions right
>>>
>>> Protobuf: didn't file an issue because I'm not sure
>>>
>>> I was wondering if pre-shading works. We really need to get rid of it
>>> from public APIs in a 100% reliable way. It is also a problem for Dataflow.
>>> I was wondering if one approach is to pre-shade gcpio-protobuf-java,
>>> gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take
>>> a Message object. (and do the same for beam-model-protobuf-java since the
>>> model bits have to depend on each other but nothing else).
>>>
>>> Kenn
>>>
>>> On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía  wrote:
>>>
 Huge +1 to get rid of Guava!

 This solves annoying dependency issues for some IOs and allow us to
 get rid of the shading that makes current jars bigger than they should
 be.

 We can create our own 'micro guava' package with some classes for
 things that are hard to migrate, or that we  prefer to still have like
 the check* methods for example. Given the size of the task we should
 probably divide it into subtasks, more important is to get rid of it
 for 'sdks/java/core'. We can then attack other areas afterwards.

 Other important idea would be to get rid of Protobuf in public APIs
 like GCPIO and to better shade it from leaking into the runners. An
 unexpected side effect of this is a leak of netty via gRPC/protobuf
 that is byting us for the Spark runner, but well that's worth a
 different discussion.


 On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
  wrote:
 > a map of list is fine and not a challenge we'll face long I hope ;)
 >
 >
 > Romain Manni-Bucau
 > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
 >
 > 2018-02-01 9:40 GMT+01:00 Reuven Lax :
 >>
 >> Not sure we'll be able to replace them all. Things like guava Table
 and
 >> Multimap don't have great replacements in Java8.
 >>
 >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <
 j...@nanthrax.net>
 >> wrote:
 >>>
 >>> +1, it was on my TODO for a while waiting the Java8 update.
 >>>
 >>> Regards
 >>> JB
 >>>
 >>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
 >>> > Why not dropping guava for all beam codebase? With java 8 it is
 quite
 >>> > easy to do
 >>> > it and avoid a bunch of conflicts. Did it in 2 projects with
 quite a
 >>> > good result.
 >>> >
 >>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" >> > > a écrit :
 >>> >
 >>> > Make sure to include the guava version in the artifact name
 so that
 >>> > we can
 >>> > have multiple vendored versions.
 >>> >
 >>> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <
 k...@google.com
 >>> > > wrote:
 >>> >
 >>> > I didn't have time for this, but it just bit me. We
 definitely
 >>> > have
 >>> > Guava on the API surface of runner support code in ways
 that
 >>> > get
 >>> > incompatibly shaded. I will probably start "1a" by making
 a
 >>> > shaded
 >>> > library org.apache.beam:vendored-guava 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Romain Manni-Bucau
2018-02-02 15:37 GMT+01:00 Kenneth Knowles :

> Another couple:
>
>  - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken
>

Technically reflect Type is enough


>  - ImmutableList and friends and their builders are very widely used and
> IMO still add a lot for readability and preventing someone coming along and
> adding mistakes to a codebase
>

Sugar but not required. When you compare the cost of a shade versus of
duplicating the parts we need there is no real match IMHO.


>
> So considering it all, I would keep a vendored Guava (but also move off
> where we can, and also have our own improvements). I hope it will be a
> near-empty build file to generate it so not a maintenance burden.
>
> Kenn
>
> On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles  wrote:
>
>> Nice. This sounds like a great idea (or two?) and goes along with what I
>> just started for futures.
>>
>> Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and
>> assigned to Ismaël :-) and converted my futures thing to a subtask.
>>
>> Specific things for our micro guava:
>>
>>  - checkArgumentNotNull can throw the right exception
>>  - our own Optional because Java's is not Serializable
>>  - futures combinators since many are missing, especially Java's don't do
>> exceptions right
>>
>> Protobuf: didn't file an issue because I'm not sure
>>
>> I was wondering if pre-shading works. We really need to get rid of it
>> from public APIs in a 100% reliable way. It is also a problem for Dataflow.
>> I was wondering if one approach is to pre-shade gcpio-protobuf-java,
>> gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take
>> a Message object. (and do the same for beam-model-protobuf-java since the
>> model bits have to depend on each other but nothing else).
>>
>> Kenn
>>
>> On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía  wrote:
>>
>>> Huge +1 to get rid of Guava!
>>>
>>> This solves annoying dependency issues for some IOs and allow us to
>>> get rid of the shading that makes current jars bigger than they should
>>> be.
>>>
>>> We can create our own 'micro guava' package with some classes for
>>> things that are hard to migrate, or that we  prefer to still have like
>>> the check* methods for example. Given the size of the task we should
>>> probably divide it into subtasks, more important is to get rid of it
>>> for 'sdks/java/core'. We can then attack other areas afterwards.
>>>
>>> Other important idea would be to get rid of Protobuf in public APIs
>>> like GCPIO and to better shade it from leaking into the runners. An
>>> unexpected side effect of this is a leak of netty via gRPC/protobuf
>>> that is byting us for the Spark runner, but well that's worth a
>>> different discussion.
>>>
>>>
>>> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
>>>  wrote:
>>> > a map of list is fine and not a challenge we'll face long I hope ;)
>>> >
>>> >
>>> > Romain Manni-Bucau
>>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>> >
>>> > 2018-02-01 9:40 GMT+01:00 Reuven Lax :
>>> >>
>>> >> Not sure we'll be able to replace them all. Things like guava Table
>>> and
>>> >> Multimap don't have great replacements in Java8.
>>> >>
>>> >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <
>>> j...@nanthrax.net>
>>> >> wrote:
>>> >>>
>>> >>> +1, it was on my TODO for a while waiting the Java8 update.
>>> >>>
>>> >>> Regards
>>> >>> JB
>>> >>>
>>> >>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
>>> >>> > Why not dropping guava for all beam codebase? With java 8 it is
>>> quite
>>> >>> > easy to do
>>> >>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite
>>> a
>>> >>> > good result.
>>> >>> >
>>> >>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" >> >>> > > a écrit :
>>> >>> >
>>> >>> > Make sure to include the guava version in the artifact name so
>>> that
>>> >>> > we can
>>> >>> > have multiple vendored versions.
>>> >>> >
>>> >>> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <
>>> k...@google.com
>>> >>> > > wrote:
>>> >>> >
>>> >>> > I didn't have time for this, but it just bit me. We
>>> definitely
>>> >>> > have
>>> >>> > Guava on the API surface of runner support code in ways
>>> that
>>> >>> > get
>>> >>> > incompatibly shaded. I will probably start "1a" by making a
>>> >>> > shaded
>>> >>> > library org.apache.beam:vendored-guava and starting to use
>>> it.
>>> >>> > It sounds
>>> >>> > like there is generally unanimous support for that much,
>>> >>> > anyhow.
>>> >>> >
>>> >>> > Kenn
>>> >>> >
>>> >>> > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek
>>> >>> > >> >>> > > wrote:
>>> >>> >
>>> >>> > Thanks Ismaël for bringing up this discussion again!
>>> >>> >
>>> >>> > 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-01 Thread Kenneth Knowles
Nice. This sounds like a great idea (or two?) and goes along with what I
just started for futures.

Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and assigned
to Ismaël :-) and converted my futures thing to a subtask.

Specific things for our micro guava:

 - checkArgumentNotNull can throw the right exception
 - our own Optional because Java's is not Serializable
 - futures combinators since many are missing, especially Java's don't do
exceptions right

Protobuf: didn't file an issue because I'm not sure

I was wondering if pre-shading works. We really need to get rid of it from
public APIs in a 100% reliable way. It is also a problem for Dataflow. I
was wondering if one approach is to pre-shade gcpio-protobuf-java,
gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take
a Message object. (and do the same for beam-model-protobuf-java since the
model bits have to depend on each other but nothing else).

Kenn

On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía  wrote:

> Huge +1 to get rid of Guava!
>
> This solves annoying dependency issues for some IOs and allow us to
> get rid of the shading that makes current jars bigger than they should
> be.
>
> We can create our own 'micro guava' package with some classes for
> things that are hard to migrate, or that we  prefer to still have like
> the check* methods for example. Given the size of the task we should
> probably divide it into subtasks, more important is to get rid of it
> for 'sdks/java/core'. We can then attack other areas afterwards.
>
> Other important idea would be to get rid of Protobuf in public APIs
> like GCPIO and to better shade it from leaking into the runners. An
> unexpected side effect of this is a leak of netty via gRPC/protobuf
> that is byting us for the Spark runner, but well that's worth a
> different discussion.
>
>
> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
>  wrote:
> > a map of list is fine and not a challenge we'll face long I hope ;)
> >
> >
> > Romain Manni-Bucau
> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >
> > 2018-02-01 9:40 GMT+01:00 Reuven Lax :
> >>
> >> Not sure we'll be able to replace them all. Things like guava Table and
> >> Multimap don't have great replacements in Java8.
> >>
> >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré  >
> >> wrote:
> >>>
> >>> +1, it was on my TODO for a while waiting the Java8 update.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
> >>> > Why not dropping guava for all beam codebase? With java 8 it is quite
> >>> > easy to do
> >>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a
> >>> > good result.
> >>> >
> >>> > Le 1 févr. 2018 06:50, "Lukasz Cwik"  >>> > > a écrit :
> >>> >
> >>> > Make sure to include the guava version in the artifact name so
> that
> >>> > we can
> >>> > have multiple vendored versions.
> >>> >
> >>> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles  >>> > > wrote:
> >>> >
> >>> > I didn't have time for this, but it just bit me. We
> definitely
> >>> > have
> >>> > Guava on the API surface of runner support code in ways that
> >>> > get
> >>> > incompatibly shaded. I will probably start "1a" by making a
> >>> > shaded
> >>> > library org.apache.beam:vendored-guava and starting to use
> it.
> >>> > It sounds
> >>> > like there is generally unanimous support for that much,
> >>> > anyhow.
> >>> >
> >>> > Kenn
> >>> >
> >>> > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek
> >>> >  >>> > > wrote:
> >>> >
> >>> > Thanks Ismaël for bringing up this discussion again!
> >>> >
> >>> > I would be in favour of 1) and more specifically of 1a)
> >>> >
> >>> > Aljoscha
> >>> >
> >>> >
> >>> >> On 12. Dec 2017, at 18:56, Lukasz Cwik <
> lc...@google.com
> >>> >> > wrote:
> >>> >>
> >>> >> You can always run tests on post shaded artifacts
> instead
> >>> >> of the
> >>> >> compiled classes, it just requires us to change our
> maven
> >>> >> surefire
> >>> >> / gradle test configurations but it is true that most
> IDEs
> >>> >> would
> >>> >> behave better with a dependency jar unless you delegate
> >>> >> all the
> >>> >> build/test actions to the build system and then it won't
> >>> >> matter.
> >>> >>
> >>> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles
> >>> >>  >>> >> > wrote:
> >>> >>
> >>> >> There's also, with additional overhead,
> >>> >>
> >>> >> 1a) A relocated and shipped package for each thing
> we
> 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-01 Thread Ismaël Mejía
Huge +1 to get rid of Guava!

This solves annoying dependency issues for some IOs and allow us to
get rid of the shading that makes current jars bigger than they should
be.

We can create our own 'micro guava' package with some classes for
things that are hard to migrate, or that we  prefer to still have like
the check* methods for example. Given the size of the task we should
probably divide it into subtasks, more important is to get rid of it
for 'sdks/java/core'. We can then attack other areas afterwards.

Other important idea would be to get rid of Protobuf in public APIs
like GCPIO and to better shade it from leaking into the runners. An
unexpected side effect of this is a leak of netty via gRPC/protobuf
that is byting us for the Spark runner, but well that's worth a
different discussion.


On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
 wrote:
> a map of list is fine and not a challenge we'll face long I hope ;)
>
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>
> 2018-02-01 9:40 GMT+01:00 Reuven Lax :
>>
>> Not sure we'll be able to replace them all. Things like guava Table and
>> Multimap don't have great replacements in Java8.
>>
>> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré 
>> wrote:
>>>
>>> +1, it was on my TODO for a while waiting the Java8 update.
>>>
>>> Regards
>>> JB
>>>
>>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
>>> > Why not dropping guava for all beam codebase? With java 8 it is quite
>>> > easy to do
>>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a
>>> > good result.
>>> >
>>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" >> > > a écrit :
>>> >
>>> > Make sure to include the guava version in the artifact name so that
>>> > we can
>>> > have multiple vendored versions.
>>> >
>>> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles >> > > wrote:
>>> >
>>> > I didn't have time for this, but it just bit me. We definitely
>>> > have
>>> > Guava on the API surface of runner support code in ways that
>>> > get
>>> > incompatibly shaded. I will probably start "1a" by making a
>>> > shaded
>>> > library org.apache.beam:vendored-guava and starting to use it.
>>> > It sounds
>>> > like there is generally unanimous support for that much,
>>> > anyhow.
>>> >
>>> > Kenn
>>> >
>>> > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek
>>> > >> > > wrote:
>>> >
>>> > Thanks Ismaël for bringing up this discussion again!
>>> >
>>> > I would be in favour of 1) and more specifically of 1a)
>>> >
>>> > Aljoscha
>>> >
>>> >
>>> >> On 12. Dec 2017, at 18:56, Lukasz Cwik >> >> > wrote:
>>> >>
>>> >> You can always run tests on post shaded artifacts instead
>>> >> of the
>>> >> compiled classes, it just requires us to change our maven
>>> >> surefire
>>> >> / gradle test configurations but it is true that most IDEs
>>> >> would
>>> >> behave better with a dependency jar unless you delegate
>>> >> all the
>>> >> build/test actions to the build system and then it won't
>>> >> matter.
>>> >>
>>> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles
>>> >> >> >> > wrote:
>>> >>
>>> >> There's also, with additional overhead,
>>> >>
>>> >> 1a) A relocated and shipped package for each thing we
>>> >> want to
>>> >> relocate. I think this has also been tried outside
>>> >> Beam...
>>> >>
>>> >> Pros:
>>> >> * all the pros of 1) plus no bloat beyond what is
>>> >> necessary
>>> >> Cons:
>>> >> * abandons whitelist approach for public deps,
>>> >> reverting to
>>> >> blacklist approach for trouble things like guava, so a
>>> >> bit
>>> >> less principled
>>> >>
>>> >> For both 1) and 1a) I would add:
>>> >>
>>> >> Pros:
>>> >> * clearly readable dependency since code will `import
>>> >> org.apache.beam.private.guava21` and IDEs will
>>> >> understand it
>>> >> is a distinct lilbrary
>>> >> * can run tests on unpackaged classes, as long as deps
>>> >> are
>>> >> shaded or provided as jars
>>> >> * no mysterious action at a distance from inherited
>>> >> configuration
>>> >> Cons:
>>> >> * need to adjust imports
>>> >>
>>> >> Kenn
>>> >>
>>> >> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik
>>> >> >> >>

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-01 Thread Romain Manni-Bucau
a map of list is fine and not a challenge we'll face long I hope ;)


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn 

2018-02-01 9:40 GMT+01:00 Reuven Lax :

> Not sure we'll be able to replace them all. Things like guava Table and
> Multimap don't have great replacements in Java8.
>
> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré 
> wrote:
>
>> +1, it was on my TODO for a while waiting the Java8 update.
>>
>> Regards
>> JB
>>
>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
>> > Why not dropping guava for all beam codebase? With java 8 it is quite
>> easy to do
>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a
>> good result.
>> >
>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" > > > a écrit :
>> >
>> > Make sure to include the guava version in the artifact name so that
>> we can
>> > have multiple vendored versions.
>> >
>> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles > > > wrote:
>> >
>> > I didn't have time for this, but it just bit me. We definitely
>> have
>> > Guava on the API surface of runner support code in ways that get
>> > incompatibly shaded. I will probably start "1a" by making a
>> shaded
>> > library org.apache.beam:vendored-guava and starting to use it.
>> It sounds
>> > like there is generally unanimous support for that much, anyhow.
>> >
>> > Kenn
>> >
>> > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <
>> aljos...@apache.org
>> > > wrote:
>> >
>> > Thanks Ismaël for bringing up this discussion again!
>> >
>> > I would be in favour of 1) and more specifically of 1a)
>> >
>> > Aljoscha
>> >
>> >
>> >> On 12. Dec 2017, at 18:56, Lukasz Cwik > >> > wrote:
>> >>
>> >> You can always run tests on post shaded artifacts instead
>> of the
>> >> compiled classes, it just requires us to change our maven
>> surefire
>> >> / gradle test configurations but it is true that most IDEs
>> would
>> >> behave better with a dependency jar unless you delegate
>> all the
>> >> build/test actions to the build system and then it won't
>> matter.
>> >>
>> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <
>> k...@google.com
>> >> > wrote:
>> >>
>> >> There's also, with additional overhead,
>> >>
>> >> 1a) A relocated and shipped package for each thing we
>> want to
>> >> relocate. I think this has also been tried outside
>> Beam...
>> >>
>> >> Pros:
>> >> * all the pros of 1) plus no bloat beyond what is
>> necessary
>> >> Cons:
>> >> * abandons whitelist approach for public deps,
>> reverting to
>> >> blacklist approach for trouble things like guava, so a
>> bit
>> >> less principled
>> >>
>> >> For both 1) and 1a) I would add:
>> >>
>> >> Pros:
>> >> * clearly readable dependency since code will `import
>> >> org.apache.beam.private.guava21` and IDEs will
>> understand it
>> >> is a distinct lilbrary
>> >> * can run tests on unpackaged classes, as long as deps
>> are
>> >> shaded or provided as jars
>> >> * no mysterious action at a distance from inherited
>> configuration
>> >> Cons:
>> >> * need to adjust imports
>> >>
>> >> Kenn
>> >>
>> >> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <
>> lc...@google.com
>> >> > wrote:
>> >>
>> >> I would suggest that either we use:
>> >> 1) A common deps package containing shaded
>> dependencies
>> >> allows for
>> >> Pros
>> >> * doesn't require the user to build an uber jar
>> >> Risks
>> >> * dependencies package will keep growing even if
>> something
>> >> is or isn't needed by all of Apache Beam leading
>> to a
>> >> large jar anyways negating any space savings
>> >>
>> >> 2) Shade within each module to a common location
>> like
>> >> org.apache.beam.relocated.guava
>> >> Pros
>> >> * you only get the shaded dependencies of the
>> things that
>> >>  

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-01 Thread Reuven Lax
Not sure we'll be able to replace them all. Things like guava Table and
Multimap don't have great replacements in Java8.

On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré 
wrote:

> +1, it was on my TODO for a while waiting the Java8 update.
>
> Regards
> JB
>
> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
> > Why not dropping guava for all beam codebase? With java 8 it is quite
> easy to do
> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a
> good result.
> >
> > Le 1 févr. 2018 06:50, "Lukasz Cwik"  > > a écrit :
> >
> > Make sure to include the guava version in the artifact name so that
> we can
> > have multiple vendored versions.
> >
> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles  > > wrote:
> >
> > I didn't have time for this, but it just bit me. We definitely
> have
> > Guava on the API surface of runner support code in ways that get
> > incompatibly shaded. I will probably start "1a" by making a
> shaded
> > library org.apache.beam:vendored-guava and starting to use it.
> It sounds
> > like there is generally unanimous support for that much, anyhow.
> >
> > Kenn
> >
> > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <
> aljos...@apache.org
> > > wrote:
> >
> > Thanks Ismaël for bringing up this discussion again!
> >
> > I would be in favour of 1) and more specifically of 1a)
> >
> > Aljoscha
> >
> >
> >> On 12. Dec 2017, at 18:56, Lukasz Cwik  >> > wrote:
> >>
> >> You can always run tests on post shaded artifacts instead
> of the
> >> compiled classes, it just requires us to change our maven
> surefire
> >> / gradle test configurations but it is true that most IDEs
> would
> >> behave better with a dependency jar unless you delegate all
> the
> >> build/test actions to the build system and then it won't
> matter.
> >>
> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <
> k...@google.com
> >> > wrote:
> >>
> >> There's also, with additional overhead,
> >>
> >> 1a) A relocated and shipped package for each thing we
> want to
> >> relocate. I think this has also been tried outside
> Beam...
> >>
> >> Pros:
> >> * all the pros of 1) plus no bloat beyond what is
> necessary
> >> Cons:
> >> * abandons whitelist approach for public deps,
> reverting to
> >> blacklist approach for trouble things like guava, so a
> bit
> >> less principled
> >>
> >> For both 1) and 1a) I would add:
> >>
> >> Pros:
> >> * clearly readable dependency since code will `import
> >> org.apache.beam.private.guava21` and IDEs will
> understand it
> >> is a distinct lilbrary
> >> * can run tests on unpackaged classes, as long as deps
> are
> >> shaded or provided as jars
> >> * no mysterious action at a distance from inherited
> configuration
> >> Cons:
> >> * need to adjust imports
> >>
> >> Kenn
> >>
> >> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <
> lc...@google.com
> >> > wrote:
> >>
> >> I would suggest that either we use:
> >> 1) A common deps package containing shaded
> dependencies
> >> allows for
> >> Pros
> >> * doesn't require the user to build an uber jar
> >> Risks
> >> * dependencies package will keep growing even if
> something
> >> is or isn't needed by all of Apache Beam leading to
> a
> >> large jar anyways negating any space savings
> >>
> >> 2) Shade within each module to a common location
> like
> >> org.apache.beam.relocated.guava
> >> Pros
> >> * you only get the shaded dependencies of the
> things that
> >> are required
> >> * its one less dependency for users to manage
> >> Risks
> >> * requires an uber jar to be built to get the space
> >> savings (either by a user or a distribution of
> Apache
> >> Beam) otherwise we negate any space savings.
> >>
> >> If we either use a common relocation scheme or a
> >> dependencies jar then each relocation should
> 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-01-31 Thread Jean-Baptiste Onofré
+1, it was on my TODO for a while waiting the Java8 update.

Regards
JB

On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
> Why not dropping guava for all beam codebase? With java 8 it is quite easy to 
> do
> it and avoid a bunch of conflicts. Did it in 2 projects with quite a good 
> result.
> 
> Le 1 févr. 2018 06:50, "Lukasz Cwik"  > a écrit :
> 
> Make sure to include the guava version in the artifact name so that we can
> have multiple vendored versions.
> 
> On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles  > wrote:
> 
> I didn't have time for this, but it just bit me. We definitely have
> Guava on the API surface of runner support code in ways that get
> incompatibly shaded. I will probably start "1a" by making a shaded
> library org.apache.beam:vendored-guava and starting to use it. It 
> sounds
> like there is generally unanimous support for that much, anyhow.
> 
> Kenn
> 
> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek  > wrote:
> 
> Thanks Ismaël for bringing up this discussion again!
> 
> I would be in favour of 1) and more specifically of 1a)
> 
> Aljoscha
> 
> 
>> On 12. Dec 2017, at 18:56, Lukasz Cwik > > wrote:
>>
>> You can always run tests on post shaded artifacts instead of the
>> compiled classes, it just requires us to change our maven 
>> surefire
>> / gradle test configurations but it is true that most IDEs would
>> behave better with a dependency jar unless you delegate all the
>> build/test actions to the build system and then it won't matter.
>>
>> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles > > wrote:
>>
>> There's also, with additional overhead,
>>
>> 1a) A relocated and shipped package for each thing we want to
>> relocate. I think this has also been tried outside Beam...
>>
>> Pros:
>> * all the pros of 1) plus no bloat beyond what is necessary
>> Cons:
>> * abandons whitelist approach for public deps, reverting to
>> blacklist approach for trouble things like guava, so a bit
>> less principled
>>
>> For both 1) and 1a) I would add:
>>
>> Pros:
>> * clearly readable dependency since code will `import
>> org.apache.beam.private.guava21` and IDEs will understand it
>> is a distinct lilbrary
>> * can run tests on unpackaged classes, as long as deps are
>> shaded or provided as jars
>> * no mysterious action at a distance from inherited 
>> configuration
>> Cons:
>> * need to adjust imports
>>
>> Kenn
>>
>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik 
>> > > wrote:
>>
>> I would suggest that either we use:
>> 1) A common deps package containing shaded dependencies
>> allows for 
>> Pros
>> * doesn't require the user to build an uber jar
>> Risks
>> * dependencies package will keep growing even if 
>> something
>> is or isn't needed by all of Apache Beam leading to a
>> large jar anyways negating any space savings
>>
>> 2) Shade within each module to a common location like
>> org.apache.beam.relocated.guava
>> Pros
>> * you only get the shaded dependencies of the things that
>> are required
>> * its one less dependency for users to manage
>> Risks
>> * requires an uber jar to be built to get the space
>> savings (either by a user or a distribution of Apache
>> Beam) otherwise we negate any space savings.
>>
>> If we either use a common relocation scheme or a
>> dependencies jar then each relocation should specifically
>> contain the version number of the package because we 
>> would
>> like to allow for us to be using two different versions 
>> of
>> the same library.
>>
>> For the common deps package approach, should we check in
>> code where the imports contain the relocated location
>> (e.g. import 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-01-31 Thread Lukasz Cwik
That is an even better idea. A lot of guava constructs have been superseded
by stuff that was added to Java 8.

On Wed, Jan 31, 2018 at 9:56 PM, Romain Manni-Bucau 
wrote:

> Why not dropping guava for all beam codebase? With java 8 it is quite easy
> to do it and avoid a bunch of conflicts. Did it in 2 projects with quite a
> good result.
>
> Le 1 févr. 2018 06:50, "Lukasz Cwik"  a écrit :
>
>> Make sure to include the guava version in the artifact name so that we
>> can have multiple vendored versions.
>>
>> On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles  wrote:
>>
>>> I didn't have time for this, but it just bit me. We definitely have
>>> Guava on the API surface of runner support code in ways that get
>>> incompatibly shaded. I will probably start "1a" by making a shaded library
>>> org.apache.beam:vendored-guava and starting to use it. It sounds like there
>>> is generally unanimous support for that much, anyhow.
>>>
>>> Kenn
>>>
>>> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek 
>>> wrote:
>>>
 Thanks Ismaël for bringing up this discussion again!

 I would be in favour of 1) and more specifically of 1a)

 Aljoscha


 On 12. Dec 2017, at 18:56, Lukasz Cwik  wrote:

 You can always run tests on post shaded artifacts instead of the
 compiled classes, it just requires us to change our maven surefire / gradle
 test configurations but it is true that most IDEs would behave better with
 a dependency jar unless you delegate all the build/test actions to the
 build system and then it won't matter.

 On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles 
 wrote:

> There's also, with additional overhead,
>
> 1a) A relocated and shipped package for each thing we want to
> relocate. I think this has also been tried outside Beam...
>
> Pros:
> * all the pros of 1) plus no bloat beyond what is necessary
> Cons:
> * abandons whitelist approach for public deps, reverting to blacklist
> approach for trouble things like guava, so a bit less principled
>
> For both 1) and 1a) I would add:
>
> Pros:
> * clearly readable dependency since code will `import
> org.apache.beam.private.guava21` and IDEs will understand it is a
> distinct lilbrary
> * can run tests on unpackaged classes, as long as deps are shaded or
> provided as jars
> * no mysterious action at a distance from inherited configuration
> Cons:
> * need to adjust imports
>
> Kenn
>
> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik  wrote:
>
>> I would suggest that either we use:
>> 1) A common deps package containing shaded dependencies allows for
>> Pros
>> * doesn't require the user to build an uber jar
>> Risks
>> * dependencies package will keep growing even if something is or
>> isn't needed by all of Apache Beam leading to a large jar anyways 
>> negating
>> any space savings
>>
>> 2) Shade within each module to a common location like
>> org.apache.beam.relocated.guava
>> Pros
>> * you only get the shaded dependencies of the things that are required
>> * its one less dependency for users to manage
>> Risks
>> * requires an uber jar to be built to get the space savings (either
>> by a user or a distribution of Apache Beam) otherwise we negate any space
>> savings.
>>
>> If we either use a common relocation scheme or a dependencies jar
>> then each relocation should specifically contain the version number of 
>> the
>> package because we would like to allow for us to be using two different
>> versions of the same library.
>>
>> For the common deps package approach, should we check in code where
>> the imports contain the relocated location (e.g. import
>> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)?
>>
>>
>> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré <
>> j...@nanthrax.net> wrote:
>>
>>> Thanks for bringing that back.
>>>
>>> Indeed guava is shaded in different uber-jar. Maybe we can have a
>>> common deps module that we include once (but the user will have to
>>> explicitly define the dep) ?
>>>
>>> Basically, what do you propose for protobuf (unfortunately, I don't
>>> see an obvious) ?
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>>>
 Hello, I wanted to bring back this subject because I think we should
 take action on this and at least first have a shaded version of
 guava.
 I was playing with a toy project and I did the procedure we use to
 submit jars to a Hadoop cluster via Flink/Spark which involves
 creating an uber jar and I 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-01-31 Thread Romain Manni-Bucau
Why not dropping guava for all beam codebase? With java 8 it is quite easy
to do it and avoid a bunch of conflicts. Did it in 2 projects with quite a
good result.

Le 1 févr. 2018 06:50, "Lukasz Cwik"  a écrit :

> Make sure to include the guava version in the artifact name so that we can
> have multiple vendored versions.
>
> On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles  wrote:
>
>> I didn't have time for this, but it just bit me. We definitely have Guava
>> on the API surface of runner support code in ways that get incompatibly
>> shaded. I will probably start "1a" by making a shaded library
>> org.apache.beam:vendored-guava and starting to use it. It sounds like there
>> is generally unanimous support for that much, anyhow.
>>
>> Kenn
>>
>> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek 
>> wrote:
>>
>>> Thanks Ismaël for bringing up this discussion again!
>>>
>>> I would be in favour of 1) and more specifically of 1a)
>>>
>>> Aljoscha
>>>
>>>
>>> On 12. Dec 2017, at 18:56, Lukasz Cwik  wrote:
>>>
>>> You can always run tests on post shaded artifacts instead of the
>>> compiled classes, it just requires us to change our maven surefire / gradle
>>> test configurations but it is true that most IDEs would behave better with
>>> a dependency jar unless you delegate all the build/test actions to the
>>> build system and then it won't matter.
>>>
>>> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles  wrote:
>>>
 There's also, with additional overhead,

 1a) A relocated and shipped package for each thing we want to relocate.
 I think this has also been tried outside Beam...

 Pros:
 * all the pros of 1) plus no bloat beyond what is necessary
 Cons:
 * abandons whitelist approach for public deps, reverting to blacklist
 approach for trouble things like guava, so a bit less principled

 For both 1) and 1a) I would add:

 Pros:
 * clearly readable dependency since code will `import
 org.apache.beam.private.guava21` and IDEs will understand it is a
 distinct lilbrary
 * can run tests on unpackaged classes, as long as deps are shaded or
 provided as jars
 * no mysterious action at a distance from inherited configuration
 Cons:
 * need to adjust imports

 Kenn

 On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik  wrote:

> I would suggest that either we use:
> 1) A common deps package containing shaded dependencies allows for
> Pros
> * doesn't require the user to build an uber jar
> Risks
> * dependencies package will keep growing even if something is or isn't
> needed by all of Apache Beam leading to a large jar anyways negating any
> space savings
>
> 2) Shade within each module to a common location like
> org.apache.beam.relocated.guava
> Pros
> * you only get the shaded dependencies of the things that are required
> * its one less dependency for users to manage
> Risks
> * requires an uber jar to be built to get the space savings (either by
> a user or a distribution of Apache Beam) otherwise we negate any space
> savings.
>
> If we either use a common relocation scheme or a dependencies jar then
> each relocation should specifically contain the version number of the
> package because we would like to allow for us to be using two different
> versions of the same library.
>
> For the common deps package approach, should we check in code where
> the imports contain the relocated location (e.g. import
> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)?
>
>
> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré  > wrote:
>
>> Thanks for bringing that back.
>>
>> Indeed guava is shaded in different uber-jar. Maybe we can have a
>> common deps module that we include once (but the user will have to
>> explicitly define the dep) ?
>>
>> Basically, what do you propose for protobuf (unfortunately, I don't
>> see an obvious) ?
>>
>> Regards
>> JB
>>
>>
>> On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>>
>>> Hello, I wanted to bring back this subject because I think we should
>>> take action on this and at least first have a shaded version of
>>> guava.
>>> I was playing with a toy project and I did the procedure we use to
>>> submit jars to a Hadoop cluster via Flink/Spark which involves
>>> creating an uber jar and I realized that the size of the jar was way
>>> bigger than I expected, and the fact that we shade guava in every
>>> module contributes to this. I found guava shaded on:
>>>
>>> sdks/java/core
>>> runners/core-construction-java
>>> runners/core-java
>>> model/job-management
>>> runners/spark
>>> 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-01-31 Thread Lukasz Cwik
Make sure to include the guava version in the artifact name so that we can
have multiple vendored versions.

On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles  wrote:

> I didn't have time for this, but it just bit me. We definitely have Guava
> on the API surface of runner support code in ways that get incompatibly
> shaded. I will probably start "1a" by making a shaded library
> org.apache.beam:vendored-guava and starting to use it. It sounds like there
> is generally unanimous support for that much, anyhow.
>
> Kenn
>
> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek 
> wrote:
>
>> Thanks Ismaël for bringing up this discussion again!
>>
>> I would be in favour of 1) and more specifically of 1a)
>>
>> Aljoscha
>>
>>
>> On 12. Dec 2017, at 18:56, Lukasz Cwik  wrote:
>>
>> You can always run tests on post shaded artifacts instead of the compiled
>> classes, it just requires us to change our maven surefire / gradle test
>> configurations but it is true that most IDEs would behave better with a
>> dependency jar unless you delegate all the build/test actions to the build
>> system and then it won't matter.
>>
>> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles  wrote:
>>
>>> There's also, with additional overhead,
>>>
>>> 1a) A relocated and shipped package for each thing we want to relocate.
>>> I think this has also been tried outside Beam...
>>>
>>> Pros:
>>> * all the pros of 1) plus no bloat beyond what is necessary
>>> Cons:
>>> * abandons whitelist approach for public deps, reverting to blacklist
>>> approach for trouble things like guava, so a bit less principled
>>>
>>> For both 1) and 1a) I would add:
>>>
>>> Pros:
>>> * clearly readable dependency since code will `import
>>> org.apache.beam.private.guava21` and IDEs will understand it is a
>>> distinct lilbrary
>>> * can run tests on unpackaged classes, as long as deps are shaded or
>>> provided as jars
>>> * no mysterious action at a distance from inherited configuration
>>> Cons:
>>> * need to adjust imports
>>>
>>> Kenn
>>>
>>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik  wrote:
>>>
 I would suggest that either we use:
 1) A common deps package containing shaded dependencies allows for
 Pros
 * doesn't require the user to build an uber jar
 Risks
 * dependencies package will keep growing even if something is or isn't
 needed by all of Apache Beam leading to a large jar anyways negating any
 space savings

 2) Shade within each module to a common location like
 org.apache.beam.relocated.guava
 Pros
 * you only get the shaded dependencies of the things that are required
 * its one less dependency for users to manage
 Risks
 * requires an uber jar to be built to get the space savings (either by
 a user or a distribution of Apache Beam) otherwise we negate any space
 savings.

 If we either use a common relocation scheme or a dependencies jar then
 each relocation should specifically contain the version number of the
 package because we would like to allow for us to be using two different
 versions of the same library.

 For the common deps package approach, should we check in code where the
 imports contain the relocated location (e.g. import
 org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)?


 On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré 
 wrote:

> Thanks for bringing that back.
>
> Indeed guava is shaded in different uber-jar. Maybe we can have a
> common deps module that we include once (but the user will have to
> explicitly define the dep) ?
>
> Basically, what do you propose for protobuf (unfortunately, I don't
> see an obvious) ?
>
> Regards
> JB
>
>
> On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>
>> Hello, I wanted to bring back this subject because I think we should
>> take action on this and at least first have a shaded version of guava.
>> I was playing with a toy project and I did the procedure we use to
>> submit jars to a Hadoop cluster via Flink/Spark which involves
>> creating an uber jar and I realized that the size of the jar was way
>> bigger than I expected, and the fact that we shade guava in every
>> module contributes to this. I found guava shaded on:
>>
>> sdks/java/core
>> runners/core-construction-java
>> runners/core-java
>> model/job-management
>> runners/spark
>> sdks/java/io/hadoop-file-system
>> sdks/java/io/kafka
>>
>> This means at least 6 times more of the size it should which counts in
>> around 15MB more (2.4MB*6 deps) of extra weight that we can simply
>> reduce by using a shaded version of the library.
>>
>> I add this point to the previous ones mentioned during the discussion
>> because this 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-01-31 Thread Kenneth Knowles
I didn't have time for this, but it just bit me. We definitely have Guava
on the API surface of runner support code in ways that get incompatibly
shaded. I will probably start "1a" by making a shaded library
org.apache.beam:vendored-guava and starting to use it. It sounds like there
is generally unanimous support for that much, anyhow.

Kenn

On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek 
wrote:

> Thanks Ismaël for bringing up this discussion again!
>
> I would be in favour of 1) and more specifically of 1a)
>
> Aljoscha
>
>
> On 12. Dec 2017, at 18:56, Lukasz Cwik  wrote:
>
> You can always run tests on post shaded artifacts instead of the compiled
> classes, it just requires us to change our maven surefire / gradle test
> configurations but it is true that most IDEs would behave better with a
> dependency jar unless you delegate all the build/test actions to the build
> system and then it won't matter.
>
> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles  wrote:
>
>> There's also, with additional overhead,
>>
>> 1a) A relocated and shipped package for each thing we want to relocate. I
>> think this has also been tried outside Beam...
>>
>> Pros:
>> * all the pros of 1) plus no bloat beyond what is necessary
>> Cons:
>> * abandons whitelist approach for public deps, reverting to blacklist
>> approach for trouble things like guava, so a bit less principled
>>
>> For both 1) and 1a) I would add:
>>
>> Pros:
>> * clearly readable dependency since code will `import
>> org.apache.beam.private.guava21` and IDEs will understand it is a
>> distinct lilbrary
>> * can run tests on unpackaged classes, as long as deps are shaded or
>> provided as jars
>> * no mysterious action at a distance from inherited configuration
>> Cons:
>> * need to adjust imports
>>
>> Kenn
>>
>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik  wrote:
>>
>>> I would suggest that either we use:
>>> 1) A common deps package containing shaded dependencies allows for
>>> Pros
>>> * doesn't require the user to build an uber jar
>>> Risks
>>> * dependencies package will keep growing even if something is or isn't
>>> needed by all of Apache Beam leading to a large jar anyways negating any
>>> space savings
>>>
>>> 2) Shade within each module to a common location like
>>> org.apache.beam.relocated.guava
>>> Pros
>>> * you only get the shaded dependencies of the things that are required
>>> * its one less dependency for users to manage
>>> Risks
>>> * requires an uber jar to be built to get the space savings (either by a
>>> user or a distribution of Apache Beam) otherwise we negate any space
>>> savings.
>>>
>>> If we either use a common relocation scheme or a dependencies jar then
>>> each relocation should specifically contain the version number of the
>>> package because we would like to allow for us to be using two different
>>> versions of the same library.
>>>
>>> For the common deps package approach, should we check in code where the
>>> imports contain the relocated location (e.g. import
>>> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)?
>>>
>>>
>>> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
 Thanks for bringing that back.

 Indeed guava is shaded in different uber-jar. Maybe we can have a
 common deps module that we include once (but the user will have to
 explicitly define the dep) ?

 Basically, what do you propose for protobuf (unfortunately, I don't see
 an obvious) ?

 Regards
 JB


 On 12/11/2017 05:35 PM, Ismaël Mejía wrote:

> Hello, I wanted to bring back this subject because I think we should
> take action on this and at least first have a shaded version of guava.
> I was playing with a toy project and I did the procedure we use to
> submit jars to a Hadoop cluster via Flink/Spark which involves
> creating an uber jar and I realized that the size of the jar was way
> bigger than I expected, and the fact that we shade guava in every
> module contributes to this. I found guava shaded on:
>
> sdks/java/core
> runners/core-construction-java
> runners/core-java
> model/job-management
> runners/spark
> sdks/java/io/hadoop-file-system
> sdks/java/io/kafka
>
> This means at least 6 times more of the size it should which counts in
> around 15MB more (2.4MB*6 deps) of extra weight that we can simply
> reduce by using a shaded version of the library.
>
> I add this point to the previous ones mentioned during the discussion
> because this goes against the end user experience and affects us all
> (devs/users).
>
> Another question is if we should shade (and how) protocol buffers
> because now with the portability work we are exposing it closer to the
> end users. I say this because I also found an issue while running a
> job on 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2017-12-11 Thread Lukasz Cwik
I would suggest that either we use:
1) A common deps package containing shaded dependencies allows for
Pros
* doesn't require the user to build an uber jar
Risks
* dependencies package will keep growing even if something is or isn't
needed by all of Apache Beam leading to a large jar anyways negating any
space savings

2) Shade within each module to a common location like
org.apache.beam.relocated.guava
Pros
* you only get the shaded dependencies of the things that are required
* its one less dependency for users to manage
Risks
* requires an uber jar to be built to get the space savings (either by a
user or a distribution of Apache Beam) otherwise we negate any space
savings.

If we either use a common relocation scheme or a dependencies jar then each
relocation should specifically contain the version number of the package
because we would like to allow for us to be using two different versions of
the same library.

For the common deps package approach, should we check in code where the
imports contain the relocated location (e.g. import
org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)?


On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré 
wrote:

> Thanks for bringing that back.
>
> Indeed guava is shaded in different uber-jar. Maybe we can have a common
> deps module that we include once (but the user will have to explicitly
> define the dep) ?
>
> Basically, what do you propose for protobuf (unfortunately, I don't see an
> obvious) ?
>
> Regards
> JB
>
>
> On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>
>> Hello, I wanted to bring back this subject because I think we should
>> take action on this and at least first have a shaded version of guava.
>> I was playing with a toy project and I did the procedure we use to
>> submit jars to a Hadoop cluster via Flink/Spark which involves
>> creating an uber jar and I realized that the size of the jar was way
>> bigger than I expected, and the fact that we shade guava in every
>> module contributes to this. I found guava shaded on:
>>
>> sdks/java/core
>> runners/core-construction-java
>> runners/core-java
>> model/job-management
>> runners/spark
>> sdks/java/io/hadoop-file-system
>> sdks/java/io/kafka
>>
>> This means at least 6 times more of the size it should which counts in
>> around 15MB more (2.4MB*6 deps) of extra weight that we can simply
>> reduce by using a shaded version of the library.
>>
>> I add this point to the previous ones mentioned during the discussion
>> because this goes against the end user experience and affects us all
>> (devs/users).
>>
>> Another question is if we should shade (and how) protocol buffers
>> because now with the portability work we are exposing it closer to the
>> end users. I say this because I also found an issue while running a
>> job on YARN with the spark runner because hadoop-common includes
>> protobuf-java 2 and I had to explicitly provide protocol-buffers 3 as
>> a dependency to be able to use triggers (note the Spark runner
>> translates them using some method from runners/core-java). Since
>> hadoop-common is provided in the cluster with the older version of
>> protobuf, I am afraid that this will bite us in the future.
>>
>> Ismaël
>>
>> ps. There is already a JIRA for that shading for protobuf on
>> hadoop-common but this is not coming until version 3 is out.
>> https://issues.apache.org/jira/browse/HADOOP-13136
>>
>> ps2. Extra curious situation is to see that the dataflow-runner ends
>> up having guava shaded twice via its shaded version on
>> core-construction-java.
>>
>> ps3. Of course this message means a de-facto +1 at least to do it for
>> guava and evaluate it for other libs.
>>
>>
>> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik 
>> wrote:
>>
>>> An issue to call out is how to deal with our generated code (.avro and
>>> .proto) as I don't believe those plugins allow you to generate code
>>> using a
>>> shaded package prefix on imports.
>>>
>>> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh 
>>> wrote:
>>>
>>> +1 to the goal. I'm hugely in favor of not doing the same shading work
 every time for dependencies we know we'll use.

 This also means that if we end up pulling in transitive dependencies we
 don't want in any particular module we can avoid having to adjust our
 repackaging strategy for that module - which I have run into face-first
 in
 the past.

 On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles 
 wrote:

 Hi all,
>
> Shading is a big part of how we keep our dependencies sane in Beam. But
> downsides: shading is super slow, causes massive jar bloat, and kind of
> hard to get right because artifacts and namespaces are not 1-to-1.
>
> I know that some communities distribute their own shaded distributions
> of
> dependencies. I had a thought about doing something similar that I
> wanted

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2017-12-11 Thread Jean-Baptiste Onofré

Thanks for bringing that back.

Indeed guava is shaded in different uber-jar. Maybe we can have a common deps 
module that we include once (but the user will have to explicitly define the dep) ?


Basically, what do you propose for protobuf (unfortunately, I don't see an 
obvious) ?


Regards
JB

On 12/11/2017 05:35 PM, Ismaël Mejía wrote:

Hello, I wanted to bring back this subject because I think we should
take action on this and at least first have a shaded version of guava.
I was playing with a toy project and I did the procedure we use to
submit jars to a Hadoop cluster via Flink/Spark which involves
creating an uber jar and I realized that the size of the jar was way
bigger than I expected, and the fact that we shade guava in every
module contributes to this. I found guava shaded on:

sdks/java/core
runners/core-construction-java
runners/core-java
model/job-management
runners/spark
sdks/java/io/hadoop-file-system
sdks/java/io/kafka

This means at least 6 times more of the size it should which counts in
around 15MB more (2.4MB*6 deps) of extra weight that we can simply
reduce by using a shaded version of the library.

I add this point to the previous ones mentioned during the discussion
because this goes against the end user experience and affects us all
(devs/users).

Another question is if we should shade (and how) protocol buffers
because now with the portability work we are exposing it closer to the
end users. I say this because I also found an issue while running a
job on YARN with the spark runner because hadoop-common includes
protobuf-java 2 and I had to explicitly provide protocol-buffers 3 as
a dependency to be able to use triggers (note the Spark runner
translates them using some method from runners/core-java). Since
hadoop-common is provided in the cluster with the older version of
protobuf, I am afraid that this will bite us in the future.

Ismaël

ps. There is already a JIRA for that shading for protobuf on
hadoop-common but this is not coming until version 3 is out.
https://issues.apache.org/jira/browse/HADOOP-13136

ps2. Extra curious situation is to see that the dataflow-runner ends
up having guava shaded twice via its shaded version on
core-construction-java.

ps3. Of course this message means a de-facto +1 at least to do it for
guava and evaluate it for other libs.


On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik  wrote:

An issue to call out is how to deal with our generated code (.avro and
.proto) as I don't believe those plugins allow you to generate code using a
shaded package prefix on imports.

On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh 
wrote:


+1 to the goal. I'm hugely in favor of not doing the same shading work
every time for dependencies we know we'll use.

This also means that if we end up pulling in transitive dependencies we
don't want in any particular module we can avoid having to adjust our
repackaging strategy for that module - which I have run into face-first in
the past.

On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles 
wrote:


Hi all,

Shading is a big part of how we keep our dependencies sane in Beam. But
downsides: shading is super slow, causes massive jar bloat, and kind of
hard to get right because artifacts and namespaces are not 1-to-1.

I know that some communities distribute their own shaded distributions of
dependencies. I had a thought about doing something similar that I wanted
to throw out there for people to poke holes in.

To set the scene, here is how I view shading:

  - A module has public dependencies and private dependencies.
  - Public deps are used for data interchange; users must share these

deps.

  - Private deps are just functionality and can be hidden (in our case,
relocated + bundled)
  - It isn't necessarily that simple, because public and private deps

might

interact in higher-order ways ("public" is contagious)

Shading is an implementation detail of expressing these characteristics.

We

use shading selectively because of its downsides I mentioned above.

But what about this idea: Introduce shaded deps as a single separate
artifact.

  - sdks/java/private-deps: bundled uber jar with relocated versions of
everything we want to shade

  - sdks/java/core and sdks/java/harness: no relocation or bundling -
depends on `beam-sdks-java-private-deps` and imports like
`org.apache.beam.sdk.private.com.google.common` directly (this is what
they
are rewritten to

Some benefits

  - much faster builds of other modules
  - only one shaded uber jar
  - rare/no rebuilds of the uber jar
  - can use maven enforcer to forbid imports like com.google.common
  - configuration all in one place
  - no automated rewriting of our real code, which has led to some major
confusion
  - easy to implement incrementally

Downsides:

  - plenty of effort work to get there
  - unclear how many different such deps modules we need; sharing them

could

get weird
  - if we hit a 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2017-12-11 Thread Ismaël Mejía
Hello, I wanted to bring back this subject because I think we should
take action on this and at least first have a shaded version of guava.
I was playing with a toy project and I did the procedure we use to
submit jars to a Hadoop cluster via Flink/Spark which involves
creating an uber jar and I realized that the size of the jar was way
bigger than I expected, and the fact that we shade guava in every
module contributes to this. I found guava shaded on:

sdks/java/core
runners/core-construction-java
runners/core-java
model/job-management
runners/spark
sdks/java/io/hadoop-file-system
sdks/java/io/kafka

This means at least 6 times more of the size it should which counts in
around 15MB more (2.4MB*6 deps) of extra weight that we can simply
reduce by using a shaded version of the library.

I add this point to the previous ones mentioned during the discussion
because this goes against the end user experience and affects us all
(devs/users).

Another question is if we should shade (and how) protocol buffers
because now with the portability work we are exposing it closer to the
end users. I say this because I also found an issue while running a
job on YARN with the spark runner because hadoop-common includes
protobuf-java 2 and I had to explicitly provide protocol-buffers 3 as
a dependency to be able to use triggers (note the Spark runner
translates them using some method from runners/core-java). Since
hadoop-common is provided in the cluster with the older version of
protobuf, I am afraid that this will bite us in the future.

Ismaël

ps. There is already a JIRA for that shading for protobuf on
hadoop-common but this is not coming until version 3 is out.
https://issues.apache.org/jira/browse/HADOOP-13136

ps2. Extra curious situation is to see that the dataflow-runner ends
up having guava shaded twice via its shaded version on
core-construction-java.

ps3. Of course this message means a de-facto +1 at least to do it for
guava and evaluate it for other libs.


On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik  wrote:
> An issue to call out is how to deal with our generated code (.avro and
> .proto) as I don't believe those plugins allow you to generate code using a
> shaded package prefix on imports.
>
> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh 
> wrote:
>
>> +1 to the goal. I'm hugely in favor of not doing the same shading work
>> every time for dependencies we know we'll use.
>>
>> This also means that if we end up pulling in transitive dependencies we
>> don't want in any particular module we can avoid having to adjust our
>> repackaging strategy for that module - which I have run into face-first in
>> the past.
>>
>> On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles 
>> wrote:
>>
>> > Hi all,
>> >
>> > Shading is a big part of how we keep our dependencies sane in Beam. But
>> > downsides: shading is super slow, causes massive jar bloat, and kind of
>> > hard to get right because artifacts and namespaces are not 1-to-1.
>> >
>> > I know that some communities distribute their own shaded distributions of
>> > dependencies. I had a thought about doing something similar that I wanted
>> > to throw out there for people to poke holes in.
>> >
>> > To set the scene, here is how I view shading:
>> >
>> >  - A module has public dependencies and private dependencies.
>> >  - Public deps are used for data interchange; users must share these
>> deps.
>> >  - Private deps are just functionality and can be hidden (in our case,
>> > relocated + bundled)
>> >  - It isn't necessarily that simple, because public and private deps
>> might
>> > interact in higher-order ways ("public" is contagious)
>> >
>> > Shading is an implementation detail of expressing these characteristics.
>> We
>> > use shading selectively because of its downsides I mentioned above.
>> >
>> > But what about this idea: Introduce shaded deps as a single separate
>> > artifact.
>> >
>> >  - sdks/java/private-deps: bundled uber jar with relocated versions of
>> > everything we want to shade
>> >
>> >  - sdks/java/core and sdks/java/harness: no relocation or bundling -
>> > depends on `beam-sdks-java-private-deps` and imports like
>> > `org.apache.beam.sdk.private.com.google.common` directly (this is what
>> > they
>> > are rewritten to
>> >
>> > Some benefits
>> >
>> >  - much faster builds of other modules
>> >  - only one shaded uber jar
>> >  - rare/no rebuilds of the uber jar
>> >  - can use maven enforcer to forbid imports like com.google.common
>> >  - configuration all in one place
>> >  - no automated rewriting of our real code, which has led to some major
>> > confusion
>> >  - easy to implement incrementally
>> >
>> > Downsides:
>> >
>> >  - plenty of effort work to get there
>> >  - unclear how many different such deps modules we need; sharing them
>> could
>> > get weird
>> >  - if we hit a roadblock, we will have committed a lot of time
>> >
>> > Just something I was 

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2017-10-17 Thread Thomas Groh
+1 to the goal. I'm hugely in favor of not doing the same shading work
every time for dependencies we know we'll use.

This also means that if we end up pulling in transitive dependencies we
don't want in any particular module we can avoid having to adjust our
repackaging strategy for that module - which I have run into face-first in
the past.

On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles 
wrote:

> Hi all,
>
> Shading is a big part of how we keep our dependencies sane in Beam. But
> downsides: shading is super slow, causes massive jar bloat, and kind of
> hard to get right because artifacts and namespaces are not 1-to-1.
>
> I know that some communities distribute their own shaded distributions of
> dependencies. I had a thought about doing something similar that I wanted
> to throw out there for people to poke holes in.
>
> To set the scene, here is how I view shading:
>
>  - A module has public dependencies and private dependencies.
>  - Public deps are used for data interchange; users must share these deps.
>  - Private deps are just functionality and can be hidden (in our case,
> relocated + bundled)
>  - It isn't necessarily that simple, because public and private deps might
> interact in higher-order ways ("public" is contagious)
>
> Shading is an implementation detail of expressing these characteristics. We
> use shading selectively because of its downsides I mentioned above.
>
> But what about this idea: Introduce shaded deps as a single separate
> artifact.
>
>  - sdks/java/private-deps: bundled uber jar with relocated versions of
> everything we want to shade
>
>  - sdks/java/core and sdks/java/harness: no relocation or bundling -
> depends on `beam-sdks-java-private-deps` and imports like
> `org.apache.beam.sdk.private.com.google.common` directly (this is what
> they
> are rewritten to
>
> Some benefits
>
>  - much faster builds of other modules
>  - only one shaded uber jar
>  - rare/no rebuilds of the uber jar
>  - can use maven enforcer to forbid imports like com.google.common
>  - configuration all in one place
>  - no automated rewriting of our real code, which has led to some major
> confusion
>  - easy to implement incrementally
>
> Downsides:
>
>  - plenty of effort work to get there
>  - unclear how many different such deps modules we need; sharing them could
> get weird
>  - if we hit a roadblock, we will have committed a lot of time
>
> Just something I was musing as I spent another evening waiting for slow
> builds to try to confirm changes to brittle poms.
>
> Kenn
>