Re: Gradle Status [April 6]

2018-04-06 Thread Scott Wegner
Here's an end-of-day update on migration work:

* Snapshot unsigned dailies and signed release builds are working (!!).
PR/5048 [1] merges changes from Luke's branch
  * python precommit failing... will investigate python precommit Monday
* All Precommits are gradle only
* All Postcommits except performance tests and Java_JDK_Versions_Test  use
gradle (after PR/5047 [2] merged)
* Nightly snapshot release using gradle is ready; needs PR/5048 to be
merged before switching
* ValidatesRunner_Spark failing consistently; investigating

Thanks for another productive day of hacking. I'll pick up again on Monday.

[1] https://github.com/apache/beam/pull/5048
[2] https://github.com/apache/beam/pull/5047


On Fri, Apr 6, 2018 at 11:24 AM Romain Manni-Bucau 
wrote:

> Why building a zip per runner which its stack and just pointing out on
> that zip and let beam lazy load the runner:
>
> --runner=LazyRunner --lazyRunnerDir=... --lazyRunnerOptions=... (or the
> fromSystemProperties() if it gets merged a day ;))
>
> Le 6 avr. 2018 20:21, "Kenneth Knowles"  a écrit :
>
>> I'm working on finding a solution for launching the Nexmark suite with
>> each runner. This doesn't have to be done via Gradle, but we anyhow need
>> built artifacts that don't require user classpath intervention.
>>
>> It looks to me like the examples are also missing this - they have
>> separate configuration e.g. sparkRunnerPreCommit but that is overspecified
>> compared to a free-form launching of a main() program with a runner profile.
>>
>> On Fri, Apr 6, 2018 at 11:09 AM Lukasz Cwik  wrote:
>>
>>> Romain, are you talking about the profiles that exist as part of the
>>> archetype examples?
>>>
>>> If so, then those still exist and haven't been changed. If not, can you
>>> provide a link to the profile in a pom file to be clearer?
>>>
>>> On Fri, Apr 6, 2018 at 12:40 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Hi Scott,

 is it right that 2 doesn't handle the hierachy anymore and that it
 doesn't handle profiles for runners as it is currently with maven?


 Romain Manni-Bucau
 @rmannibucau  |  Blog
  | Old Blog
  | Github
  | LinkedIn
  | Book
 

 2018-04-06 18:32 GMT+02:00 Scott Wegner :

> I wanted to start a thread to summarize the current state of Gradle
> migration. We've made lots of good progress so far this week. Here's the
> status from what I can tell-- please add or correct anything I missed:
>
> * Release artifacts can be built and published for Snapshot and
> officlal releases [1]
> * Gradle-generated releases have been validated with the the Apache
> Beam archetype generation quickstart; still needs additional validation.
> * Generated release pom files have correct project metadata [2]
> * The python pre-commits are now working in Gradle [3]
> * Ismaël has started a collaborative doc of Gradle tips [4] as we all
> learn the new system-- please add your own. This will eventually feed into
> official documentation on the website.
> * Łukasz Gajowy is working on migrating performance testing framework
> [5]
> * Daniel is working on updating documentation to refer to Gradle
> instead of maven
>
> If I missed anything, please add it to this thread.
>
> The general roadmap we're working towards is:
> (a) Publish release artifacts with Gradle (SNAPSHOT and signed
> releases)
> (b) Postcommits migrated to Gradle
> (c) Migrate documentation from maven to Gradle
> (d) Migrate perfkit suites to use Gradle
>
> For those of you that are hacking: thanks for your help so far!
> Progress is being roughly tracked on the Kanban [6]; please make sure the
> issues assigned to you are up-to-date. Many of the changes are staged on
> lukecwik's local branch [7]; we'll work on merging them back soon.
>
>
> [1] https://github.com/lukecwik/incubator-beam/pull/7
> [2] https://github.com/lukecwik/incubator-beam/pull/3
> [3] https://github.com/apache/beam/pull/5032
> [4]
> https://docs.google.com/document/d/1wR56Jef3XIPwj4DFzQKznuGPM3JDfRDVkxzeDlbdVSQ/edit
> [5] https://github.com/apache/beam/pull/5003
> [6]
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242
> [7] https://github.com/lukecwik/incubator-beam/tree/gradle
> --
>
>
> Got feedback? http://go/swegner-feedback
>

 --


Got feedback? http://go/swegner-feedback


Re: [PROPOSAL] Python 3 support

2018-04-06 Thread Reuven Lax
I had a similar problem.

On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay  wrote:

> I tried to create a shared kanban board but I failed. I think I am lacking
> some permission to create a shared filter. Could someone help with creating
> this?
>
> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784 OR
> parent = BEAM-1251) ORDER BY Rank ASC"
>
> Ahmet
>
> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders 
> wrote:
>
>> Hi all,
>>
>> I don't seem to have the permissions to create a Kanban board or even
>> assign tasks to myself. Who could help me with this?
>>
>> I've updated the coders package pull request [1] and added the applied
>> strategy to the proposal document [2].
>> It would be great to get some feedback on this, so we can start moving
>> forward with other subpackages.
>>
>> Kind regards,
>> Robbe
>>
>> [1] https://github.com/apache/beam/pull/4990
>> [2]
>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>
>>
>> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders  wrote:
>>
>>> Hello Robert,
>>>
>>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>>> this. I'll look into setting one up tomorrow.
>>>
>>> In the meantime, you can find the first pull request with the updated
>>> coders package here:
>>> https://github.com/apache/beam/pull/4990
>>>
>>> Kind regards,
>>> Robbe
>>>
>>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw 
>>> wrote:
>>>
 On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
 wrote:

> Thanks Ahmet and Robert,
>
> I think we can work on different subpackages in parallel, but it's
> important to apply the same strategy everywhere. I'm currently working on
> applying step 1 (was mostly done already) and 2 of the proposal to the
> coders subpackage to create a first pull request. We can then discuss the
> applied strategy in detail before merging and applying it to the other
> subpackages.
>

 Sounds good. Again, could you document (in a more permanent/easy to
 look up state than email) when packages are started/done?


> This strategy also includes the choice of automated tools. I'm
> focusing on writing python 3 code with python 2 compatibility, which means
> depending on the future package instead of the six package (which is
> already used in some places in the current code base). I have already
> noticed that this indeed requires a lot of manual work after running the
> automated script.
> The future package supports python 3.3+ compatibility, so I don't
> think there is a higher cost supporting 3.4 compared to 3.5+.
>

 Sure. It may incur a higher maintenance burden long-term though.
 (Basically, if we go out the door with 3.4 it's a promise to support it for
 some time to come.)


> I have already added a tox environment to run pylint2 with the --py3k
> argument per updated subpackage, which should help avoid regression 
> between
> step 2 and step 3 of the proposal. This update will be pushed with the
> first pull request.
>
> Kind regards,
> Robbe
>
>
> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
> wrote:
>
>> Thank you, Robbie, for your offer to help with contribution here. I
>> read over your doc and the one thing I'd like to add is that this work is
>> very parallelizable, but if we have enough people looking at it we'll 
>> want
>> some way to coordinate so as to not overlap work (or just waste time
>> discovering what's been done). Tracking individual JIRAs and PRs gets
>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>> various automated/manual conversions along the other would be helpful?
>>
>> A note on automated tools, they're sometimes overly conservative, so
>> we should be sure to review the changes manually. (A typical example of
>> this is unnecessarily importing six.moves.xrange when there was no big
>> reason to use xrange over range in Python 2, or conversely using
>> list(range(...) in Python 3.)
>>
>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
>> If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we 
>> should
>> identify it and decide that before widespread announcement.
>>
>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
>>> wrote:
>>>

 On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
 robbe.sneyd...@ml6.eu> wrote:

> Hi Anand,
>
> Thanks for the feedback.
>
> It should be no problem to run everything on DataflowRunner as
> well.
> Are there any performance tests in place to check for performance
> regressions?
>

>>> Yes there is a suite (
>>>

Re: [PROPOSAL] Python 3 support

2018-04-06 Thread Ahmet Altay
I tried to create a shared kanban board but I failed. I think I am lacking
some permission to create a shared filter. Could someone help with creating
this?

The filter I planned to use was "project = BEAM AND (parent = BEAM-2784 OR
parent = BEAM-1251) ORDER BY Rank ASC"

Ahmet

On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders 
wrote:

> Hi all,
>
> I don't seem to have the permissions to create a Kanban board or even
> assign tasks to myself. Who could help me with this?
>
> I've updated the coders package pull request [1] and added the applied
> strategy to the proposal document [2].
> It would be great to get some feedback on this, so we can start moving
> forward with other subpackages.
>
> Kind regards,
> Robbe
>
> [1] https://github.com/apache/beam/pull/4990
> [2] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>
> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders  wrote:
>
>> Hello Robert,
>>
>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>> this. I'll look into setting one up tomorrow.
>>
>> In the meantime, you can find the first pull request with the updated
>> coders package here:
>> https://github.com/apache/beam/pull/4990
>>
>> Kind regards,
>> Robbe
>>
>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw  wrote:
>>
>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
>>> wrote:
>>>
 Thanks Ahmet and Robert,

 I think we can work on different subpackages in parallel, but it's
 important to apply the same strategy everywhere. I'm currently working on
 applying step 1 (was mostly done already) and 2 of the proposal to the
 coders subpackage to create a first pull request. We can then discuss the
 applied strategy in detail before merging and applying it to the other
 subpackages.

>>>
>>> Sounds good. Again, could you document (in a more permanent/easy to look
>>> up state than email) when packages are started/done?
>>>
>>>
 This strategy also includes the choice of automated tools. I'm focusing
 on writing python 3 code with python 2 compatibility, which means depending
 on the future package instead of the six package (which is already used in
 some places in the current code base). I have already noticed that this
 indeed requires a lot of manual work after running the automated script.
 The future package supports python 3.3+ compatibility, so I don't think
 there is a higher cost supporting 3.4 compared to 3.5+.

>>>
>>> Sure. It may incur a higher maintenance burden long-term though.
>>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>>> some time to come.)
>>>
>>>
 I have already added a tox environment to run pylint2 with the --py3k
 argument per updated subpackage, which should help avoid regression between
 step 2 and step 3 of the proposal. This update will be pushed with the
 first pull request.

 Kind regards,
 Robbe


 On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
 wrote:

> Thank you, Robbie, for your offer to help with contribution here. I
> read over your doc and the one thing I'd like to add is that this work is
> very parallelizable, but if we have enough people looking at it we'll want
> some way to coordinate so as to not overlap work (or just waste time
> discovering what's been done). Tracking individual JIRAs and PRs gets
> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
> various automated/manual conversions along the other would be helpful?
>
> A note on automated tools, they're sometimes overly conservative, so
> we should be sure to review the changes manually. (A typical example of
> this is unnecessarily importing six.moves.xrange when there was no big
> reason to use xrange over range in Python 2, or conversely using
> list(range(...) in Python 3.)
>
> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
> If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
> identify it and decide that before widespread announcement.
>
> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>
>>
>>
>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
>> wrote:
>>
>>>
>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
>>> robbe.sneyd...@ml6.eu> wrote:
>>>
 Hi Anand,

 Thanks for the feedback.

 It should be no problem to run everything on DataflowRunner as well.
 Are there any performance tests in place to check for performance
 regressions?

>>>
>> Yes there is a suite (https://github.com/apache/
>> beam/blob/master/.test-infra/jenkins/job_beam_
>> PerformanceTests_Python.groovy). It may not be very comprehensive
>> and seems to be failing for a while. I would not block python 3 work on
>> performance for now. Tha

Re: building on top of filesystem, can beam help?

2018-04-06 Thread Romain Manni-Bucau
I did a PR for that but it just beings connectivity to beam. To solve any
issue the opposite is the only valid option.

Le 6 avr. 2018 22:31, "Reuven Lax"  a écrit :

In the other thread, we suggested writing a Beam FileSystem impl that wraps
VFS. Is that a path forward here? Then you can build on top of VFS instead,
and simply layer VfsFilesystem on top of it when running on Beam.

On Fri, Apr 6, 2018 at 1:23 PM Romain Manni-Bucau 
wrote:

> Partially. Will run with beam in half of the cases or without in the
> remaining 50% (and in this case the dependencies+api are currently
> blocking). My constraint is that to activate any feature i must be able to
> cover both cases.
>
>
>
>
> Le 6 avr. 2018 22:14, "Reuven Lax"  a écrit :
>
>> So is this project of yours also built on top of Beam, or is it unrelated?
>>
>> On Fri, Apr 6, 2018 at 1:12 PM Romain Manni-Bucau 
>> wrote:
>>
>>> Issues forking are:
>>>
>>> 1. I have to drop beam FileIO (in all its flavors) which means not
>>> taking any benefit from beam in that area which is 50% of beam gain (the
>>> other being the portability)
>>> 2. I have to maintain a bridge for all filesystem impl - being said it
>>> still misses some info ATM
>>> 3. It is still in beam sdk so "here" which is misleading for dev (can be
>>> fixed if beam becomes modular)
>>>
>>> As a side note - and to link with another thread topic: with vfs as an
>>> abstraction i dont have that issue at all.
>>>
>>> Le 6 avr. 2018 20:35, "Reuven Lax"  a écrit :
>>>
 Personally, this is a case where I think forking might be a better
 option, even though I'm not generally a fan of duplicating code.

 In past projects, depending on internal modules of other projects never
 seems lead to good outcomes. FileSystem exists for Beam today, and Beam
 might make changes to it that cause problems for your other project. I
 would recommend starting by forking if it serves your needs.

 Reuven

 On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Hi guys,
>
> I have a use case where I'd like to be able to expose to a user some
> file system navigation and enable him to visualize the file system (as in
> beam sense)
>
> Technically it is a matter of being able to use glob pattern to browse
> the file system using match(specs).
>
> What is important in that use case is to align the visualization and
> the potential runtime to have the same impl/view and not have to split it
> in 2 code branches which can lead to inconsistency.
>
> Therefore i'd like to be able to reuse beam FileSystem but I have a
> few blockers:
>
> 1. it is nested in sdk-java-core which brings 2 drawbacks
> a. it brings the whole beam sdk which is not desired in that part of
> the app (should not be visible in the classpath)
> b. the dependency stack is just unpractiable (guava, jackson,
> byte-buddy, avro, joda, at least, are not desired at all here) and a shade
> makes it way too fat to be a valid dependency for that usage
> 2. I don't know how to configure the FS from one of its instance (I'd
> like to be able to get its options class like 
> FileSystem#getConfigurationType
> returning a PipelineOptions)
>
> Do you think it is possible to extract the filesystem API in a
> dependency free beam subproject (or at least submodule) and add the
> configuration hint in the API?
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>



Re: building on top of filesystem, can beam help?

2018-04-06 Thread Reuven Lax
In the other thread, we suggested writing a Beam FileSystem impl that wraps
VFS. Is that a path forward here? Then you can build on top of VFS instead,
and simply layer VfsFilesystem on top of it when running on Beam.

On Fri, Apr 6, 2018 at 1:23 PM Romain Manni-Bucau 
wrote:

> Partially. Will run with beam in half of the cases or without in the
> remaining 50% (and in this case the dependencies+api are currently
> blocking). My constraint is that to activate any feature i must be able to
> cover both cases.
>
>
>
>
> Le 6 avr. 2018 22:14, "Reuven Lax"  a écrit :
>
>> So is this project of yours also built on top of Beam, or is it unrelated?
>>
>> On Fri, Apr 6, 2018 at 1:12 PM Romain Manni-Bucau 
>> wrote:
>>
>>> Issues forking are:
>>>
>>> 1. I have to drop beam FileIO (in all its flavors) which means not
>>> taking any benefit from beam in that area which is 50% of beam gain (the
>>> other being the portability)
>>> 2. I have to maintain a bridge for all filesystem impl - being said it
>>> still misses some info ATM
>>> 3. It is still in beam sdk so "here" which is misleading for dev (can be
>>> fixed if beam becomes modular)
>>>
>>> As a side note - and to link with another thread topic: with vfs as an
>>> abstraction i dont have that issue at all.
>>>
>>> Le 6 avr. 2018 20:35, "Reuven Lax"  a écrit :
>>>
 Personally, this is a case where I think forking might be a better
 option, even though I'm not generally a fan of duplicating code.

 In past projects, depending on internal modules of other projects never
 seems lead to good outcomes. FileSystem exists for Beam today, and Beam
 might make changes to it that cause problems for your other project. I
 would recommend starting by forking if it serves your needs.

 Reuven

 On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Hi guys,
>
> I have a use case where I'd like to be able to expose to a user some
> file system navigation and enable him to visualize the file system (as in
> beam sense)
>
> Technically it is a matter of being able to use glob pattern to browse
> the file system using match(specs).
>
> What is important in that use case is to align the visualization and
> the potential runtime to have the same impl/view and not have to split it
> in 2 code branches which can lead to inconsistency.
>
> Therefore i'd like to be able to reuse beam FileSystem but I have a
> few blockers:
>
> 1. it is nested in sdk-java-core which brings 2 drawbacks
> a. it brings the whole beam sdk which is not desired in that part of
> the app (should not be visible in the classpath)
> b. the dependency stack is just unpractiable (guava, jackson,
> byte-buddy, avro, joda, at least, are not desired at all here) and a shade
> makes it way too fat to be a valid dependency for that usage
> 2. I don't know how to configure the FS from one of its instance (I'd
> like to be able to get its options class like
> FileSystem#getConfigurationType returning a PipelineOptions)
>
> Do you think it is possible to extract the filesystem API in a
> dependency free beam subproject (or at least submodule) and add the
> configuration hint in the API?
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>



Re: building on top of filesystem, can beam help?

2018-04-06 Thread Romain Manni-Bucau
Partially. Will run with beam in half of the cases or without in the
remaining 50% (and in this case the dependencies+api are currently
blocking). My constraint is that to activate any feature i must be able to
cover both cases.




Le 6 avr. 2018 22:14, "Reuven Lax"  a écrit :

> So is this project of yours also built on top of Beam, or is it unrelated?
>
> On Fri, Apr 6, 2018 at 1:12 PM Romain Manni-Bucau 
> wrote:
>
>> Issues forking are:
>>
>> 1. I have to drop beam FileIO (in all its flavors) which means not taking
>> any benefit from beam in that area which is 50% of beam gain (the other
>> being the portability)
>> 2. I have to maintain a bridge for all filesystem impl - being said it
>> still misses some info ATM
>> 3. It is still in beam sdk so "here" which is misleading for dev (can be
>> fixed if beam becomes modular)
>>
>> As a side note - and to link with another thread topic: with vfs as an
>> abstraction i dont have that issue at all.
>>
>> Le 6 avr. 2018 20:35, "Reuven Lax"  a écrit :
>>
>>> Personally, this is a case where I think forking might be a better
>>> option, even though I'm not generally a fan of duplicating code.
>>>
>>> In past projects, depending on internal modules of other projects never
>>> seems lead to good outcomes. FileSystem exists for Beam today, and Beam
>>> might make changes to it that cause problems for your other project. I
>>> would recommend starting by forking if it serves your needs.
>>>
>>> Reuven
>>>
>>> On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau 
>>> wrote:
>>>
 Hi guys,

 I have a use case where I'd like to be able to expose to a user some
 file system navigation and enable him to visualize the file system (as in
 beam sense)

 Technically it is a matter of being able to use glob pattern to browse
 the file system using match(specs).

 What is important in that use case is to align the visualization and
 the potential runtime to have the same impl/view and not have to split it
 in 2 code branches which can lead to inconsistency.

 Therefore i'd like to be able to reuse beam FileSystem but I have a few
 blockers:

 1. it is nested in sdk-java-core which brings 2 drawbacks
 a. it brings the whole beam sdk which is not desired in that part of
 the app (should not be visible in the classpath)
 b. the dependency stack is just unpractiable (guava, jackson,
 byte-buddy, avro, joda, at least, are not desired at all here) and a shade
 makes it way too fat to be a valid dependency for that usage
 2. I don't know how to configure the FS from one of its instance (I'd
 like to be able to get its options class like 
 FileSystem#getConfigurationType
 returning a PipelineOptions)

 Do you think it is possible to extract the filesystem API in a
 dependency free beam subproject (or at least submodule) and add the
 configuration hint in the API?

 Romain Manni-Bucau
 @rmannibucau  |  Blog
  | Old Blog
  | Github
  | LinkedIn
  | Book
 

>>>


Re: Gradle Tips and tricks

2018-04-06 Thread Ismaël Mejía
The goal of this document is about extending the contribution guide
with some of the cases that we use from time to time during
development for quick or specific validations. This comes from a
previous discussion where Kenneth mentioned that he had also a list of
'incantations' for maven (like i did too). So better to share these
'tips'  so everyone can benefit.

I think Scott and the people working on BEAM-3985 continue focusing in
the updates there. I expect to migrate the missing parts from this doc
afterwards.

On Fri, Apr 6, 2018 at 8:23 PM, Reuven Lax  wrote:
> I think Scott just sent a summary.
>
> I agree, even when all coding work is done, the migration isn't done until
> the contribution guide (and any other documentation) is updated.
>
> On Fri, Apr 6, 2018 at 1:07 AM Jean-Baptiste Onofré  wrote:
>>
>> Agree, it's what I mentioned this morning on Slack.
>>
>> I think it would be great to have a summary of the current state on the
>> dev
>> mailing list.
>>
>> At the end of the day, the contribution guide should be updated (we have a
>> Jira
>> about that afair).
>>
>> Regards
>> JB
>>
>> On 04/06/2018 09:03 AM, Ismaël Mejía wrote:
>> > After some discussion on slack it is clear that we need to document
>> > some of the gradle replacements of our common maven commands. We
>> > started a shared doc yesterday to share some of those and other gradle
>> > tips and tricks. I invite everyone who can help to add their favorite
>> > gradle 'incantations' there and other related knowledge. We will
>> > migrate this info to the website afterwards.
>> >
>> > https://s.apache.org/beam-gradle-tips-edit
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com


Re: building on top of filesystem, can beam help?

2018-04-06 Thread Reuven Lax
So is this project of yours also built on top of Beam, or is it unrelated?

On Fri, Apr 6, 2018 at 1:12 PM Romain Manni-Bucau 
wrote:

> Issues forking are:
>
> 1. I have to drop beam FileIO (in all its flavors) which means not taking
> any benefit from beam in that area which is 50% of beam gain (the other
> being the portability)
> 2. I have to maintain a bridge for all filesystem impl - being said it
> still misses some info ATM
> 3. It is still in beam sdk so "here" which is misleading for dev (can be
> fixed if beam becomes modular)
>
> As a side note - and to link with another thread topic: with vfs as an
> abstraction i dont have that issue at all.
>
> Le 6 avr. 2018 20:35, "Reuven Lax"  a écrit :
>
>> Personally, this is a case where I think forking might be a better
>> option, even though I'm not generally a fan of duplicating code.
>>
>> In past projects, depending on internal modules of other projects never
>> seems lead to good outcomes. FileSystem exists for Beam today, and Beam
>> might make changes to it that cause problems for your other project. I
>> would recommend starting by forking if it serves your needs.
>>
>> Reuven
>>
>> On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I have a use case where I'd like to be able to expose to a user some
>>> file system navigation and enable him to visualize the file system (as in
>>> beam sense)
>>>
>>> Technically it is a matter of being able to use glob pattern to browse
>>> the file system using match(specs).
>>>
>>> What is important in that use case is to align the visualization and the
>>> potential runtime to have the same impl/view and not have to split it in 2
>>> code branches which can lead to inconsistency.
>>>
>>> Therefore i'd like to be able to reuse beam FileSystem but I have a few
>>> blockers:
>>>
>>> 1. it is nested in sdk-java-core which brings 2 drawbacks
>>> a. it brings the whole beam sdk which is not desired in that part of the
>>> app (should not be visible in the classpath)
>>> b. the dependency stack is just unpractiable (guava, jackson,
>>> byte-buddy, avro, joda, at least, are not desired at all here) and a shade
>>> makes it way too fat to be a valid dependency for that usage
>>> 2. I don't know how to configure the FS from one of its instance (I'd
>>> like to be able to get its options class like
>>> FileSystem#getConfigurationType returning a PipelineOptions)
>>>
>>> Do you think it is possible to extract the filesystem API in a
>>> dependency free beam subproject (or at least submodule) and add the
>>> configuration hint in the API?
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>>  | LinkedIn
>>>  | Book
>>> 
>>>
>>


Re: building on top of filesystem, can beam help?

2018-04-06 Thread Romain Manni-Bucau
Issues forking are:

1. I have to drop beam FileIO (in all its flavors) which means not taking
any benefit from beam in that area which is 50% of beam gain (the other
being the portability)
2. I have to maintain a bridge for all filesystem impl - being said it
still misses some info ATM
3. It is still in beam sdk so "here" which is misleading for dev (can be
fixed if beam becomes modular)

As a side note - and to link with another thread topic: with vfs as an
abstraction i dont have that issue at all.

Le 6 avr. 2018 20:35, "Reuven Lax"  a écrit :

> Personally, this is a case where I think forking might be a better option,
> even though I'm not generally a fan of duplicating code.
>
> In past projects, depending on internal modules of other projects never
> seems lead to good outcomes. FileSystem exists for Beam today, and Beam
> might make changes to it that cause problems for your other project. I
> would recommend starting by forking if it serves your needs.
>
> Reuven
>
> On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau 
> wrote:
>
>> Hi guys,
>>
>> I have a use case where I'd like to be able to expose to a user some file
>> system navigation and enable him to visualize the file system (as in beam
>> sense)
>>
>> Technically it is a matter of being able to use glob pattern to browse
>> the file system using match(specs).
>>
>> What is important in that use case is to align the visualization and the
>> potential runtime to have the same impl/view and not have to split it in 2
>> code branches which can lead to inconsistency.
>>
>> Therefore i'd like to be able to reuse beam FileSystem but I have a few
>> blockers:
>>
>> 1. it is nested in sdk-java-core which brings 2 drawbacks
>> a. it brings the whole beam sdk which is not desired in that part of the
>> app (should not be visible in the classpath)
>> b. the dependency stack is just unpractiable (guava, jackson, byte-buddy,
>> avro, joda, at least, are not desired at all here) and a shade makes it way
>> too fat to be a valid dependency for that usage
>> 2. I don't know how to configure the FS from one of its instance (I'd
>> like to be able to get its options class like FileSystem#getConfigurationType
>> returning a PipelineOptions)
>>
>> Do you think it is possible to extract the filesystem API in a dependency
>> free beam subproject (or at least submodule) and add the configuration hint
>> in the API?
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>


Re: About the Gauge metric API

2018-04-06 Thread Robert Bradshaw
On Fri, Apr 6, 2018 at 12:23 PM Ben Chambers  wrote:

> Generally strong +1 to everything Bill said. I would suggest though that
> the per-worker segmentation might be specified using some more general
> tagging/labeling API. For instance, all of the following seem like
> reasonable uses to support:
>
> 1. Gauge that is tagged with worker to get per-worker segmentation (such
> as queue size, memory usage, etc.)
> 2. Gauge that is tagged with the "key" being processed. Would be useful
> for things like how much data is buffered, where are watermark holds for
> each key, etc. If processing is partitioned by key, this is strictly more
> granular than per-worker.
> 3. Counter tagged with the "key" being processed. Would be useful for time
> spent processing each key, etc.
>

Per-key stuff gets really dangerous, as then the counter (control) plane
has O(#keys) items to keep track of. That is unless it is paired with some
kind of a lossy top/histogram aggregation.

However, I think Bill hits the nail on the head that there is an implicit
(ill-defined, in the model at least) segmentation desired here, with
different aggregations happening within vs. across segments. (Also, FWIW,
clobber is not the only aggregation that makes sense at the lowest level.)
Default counters use the same aggregation across both levels, giving useful
and well-defined semantics regardless of bundling and work distribution
(assuming associative aggregation of course), but perhaps the
counter/metrics APIs could be augmented to be able to explicitly request
the level and differing aggregation with respect to this segmentation.


> On Fri, Apr 6, 2018 at 11:39 AM Bill Neubauer  wrote:
>
>> Thanks for unraveling those themes, Pablo!
>>
>> 1. Seems reasonable in light of behaviors metrics backends support.
>> 2. Those same backends support histogramming of data, so having integer
>> types is very useful.
>> 3. I believe that is the case, for the reasons I mentioned earlier,
>> Gauges should only clobber previously values reported by the same entity.
>> Two workers with the same gauge should not be overwriting each other's
>> values, only their own. This implies per-worker segmentation.
>>
>>
>> On Fri, Apr 6, 2018 at 11:35 AM Pablo Estrada  wrote:
>>
>>> Nobody wants to get rid of Gauges. I see that we have three separate
>>> themes being discussed here, and I think it's useful to point them out and
>>> address them independently:
>>>
>>> 1. Whether Gauges should change to hold string values.
>>> 2. If Gauges are to support string values, whether Gauges should also
>>> continue to have an int API.
>>> 3. Whether Beam should support some sort of label / tag / worker-id for
>>> aggregation of Gauges (maybe other metrics?)
>>>
>>> -P.
>>>
>>> On Fri, Apr 6, 2018 at 11:21 AM Ben Chambers 
>>> wrote:
>>>
 Gauges are incredibly useful for exposing the current state of the
 system. For instance, number of elements in a queue, current memory usage,
 number of RPCs in flight, etc. As mentioned above, these concepts exist in
 numerous systems for monitoring distributed environments, including
 Stackdriver Monitoring. The key to making them work is the addition of
 labels or tags, which as an aside are also useful for *all* metric types,
 not just Gauges.

 If Beam gets rid of Gauges, how would we go about exporting "current"
 values like memory usage, RPCs in flight, etc.?

 -- Ben

 On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles  wrote:

> Just naively - the use cases that Gauge addresses seem relevant, and
> the information seems feasible to gather and present. The bit that doesn't
> seem to make sense is aggregating gauges by clobbering each other. So I
> think that's just +1 Ben?
>
> On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi 
> wrote:
>
>> I am not opposed to removing other data types, though they are extra
>> convenience for user.
>>
>> In Scott's example above, if the metric is a counter, what are the
>> guarantees provided? E.g. would it match the global count using GBK? If
>> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
>> backlog for each Kafka partition/split).
>>
>> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw 
>> wrote:
>>
>>> A String API makes it clear(er) that the values will not be
>>> aggregated in any way across workers. I don't think retaining both APIs
>>> (except for possibly some short migration period) worthwhile. On another
>>> note, I still find the distributed gague API to be a bit odd in general.
>>>
>>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi 
>>> wrote:
>>>
 I would be in favor of replacing the existing Gauge.set(long) API
> with the String version and removing the old one. This would be a 
> breaking
> change. However this is a relatively new API and is still marked
> @Experimenta

Re: About the Gauge metric API

2018-04-06 Thread Ben Chambers
Generally strong +1 to everything Bill said. I would suggest though that
the per-worker segmentation might be specified using some more general
tagging/labeling API. For instance, all of the following seem like
reasonable uses to support:

1. Gauge that is tagged with worker to get per-worker segmentation (such as
queue size, memory usage, etc.)
2. Gauge that is tagged with the "key" being processed. Would be useful for
things like how much data is buffered, where are watermark holds for each
key, etc. If processing is partitioned by key, this is strictly more
granular than per-worker.
3. Counter tagged with the "key" being processed. Would be useful for time
spent processing each key, etc.

On Fri, Apr 6, 2018 at 11:39 AM Bill Neubauer  wrote:

> Thanks for unraveling those themes, Pablo!
>
> 1. Seems reasonable in light of behaviors metrics backends support.
> 2. Those same backends support histogramming of data, so having integer
> types is very useful.
> 3. I believe that is the case, for the reasons I mentioned earlier, Gauges
> should only clobber previously values reported by the same entity. Two
> workers with the same gauge should not be overwriting each other's values,
> only their own. This implies per-worker segmentation.
>
>
> On Fri, Apr 6, 2018 at 11:35 AM Pablo Estrada  wrote:
>
>> Nobody wants to get rid of Gauges. I see that we have three separate
>> themes being discussed here, and I think it's useful to point them out and
>> address them independently:
>>
>> 1. Whether Gauges should change to hold string values.
>> 2. If Gauges are to support string values, whether Gauges should also
>> continue to have an int API.
>> 3. Whether Beam should support some sort of label / tag / worker-id for
>> aggregation of Gauges (maybe other metrics?)
>>
>> -P.
>>
>> On Fri, Apr 6, 2018 at 11:21 AM Ben Chambers 
>> wrote:
>>
>>> Gauges are incredibly useful for exposing the current state of the
>>> system. For instance, number of elements in a queue, current memory usage,
>>> number of RPCs in flight, etc. As mentioned above, these concepts exist in
>>> numerous systems for monitoring distributed environments, including
>>> Stackdriver Monitoring. The key to making them work is the addition of
>>> labels or tags, which as an aside are also useful for *all* metric types,
>>> not just Gauges.
>>>
>>> If Beam gets rid of Gauges, how would we go about exporting "current"
>>> values like memory usage, RPCs in flight, etc.?
>>>
>>> -- Ben
>>>
>>> On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles  wrote:
>>>
 Just naively - the use cases that Gauge addresses seem relevant, and
 the information seems feasible to gather and present. The bit that doesn't
 seem to make sense is aggregating gauges by clobbering each other. So I
 think that's just +1 Ben?

 On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi 
 wrote:

> I am not opposed to removing other data types, though they are extra
> convenience for user.
>
> In Scott's example above, if the metric is a counter, what are the
> guarantees provided? E.g. would it match the global count using GBK? If
> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
> backlog for each Kafka partition/split).
>
> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw 
> wrote:
>
>> A String API makes it clear(er) that the values will not be
>> aggregated in any way across workers. I don't think retaining both APIs
>> (except for possibly some short migration period) worthwhile. On another
>> note, I still find the distributed gague API to be a bit odd in general.
>>
>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi 
>> wrote:
>>
>>> I would be in favor of replacing the existing Gauge.set(long) API
 with the String version and removing the old one. This would be a 
 breaking
 change. However this is a relatively new API and is still marked
 @Experimental. Keeping the old API would retain the potential 
 confusion.
 It's better to simplify the API surface: having two APIs makes it less
 clear which one users should choose.
>>>
>>>
>>> Supporting additional data types sounds good. But the above states
>>> string API will replace the existing API. I do not see how string API 
>>> makes
>>> the semantics more clear.  Semantically both are same to the user.
>>>
>>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada 
>>> wrote:
>>>
 Hi Ben : D

 Sure, that's reasonable. And perhaps I started the discussion in
 the wrong direction. I'm not questioning the utility of Gauge metrics.

 What I'm saying is that Beam only supports integers,, but Gauges
 are aggregated by dropping old values depending on their update times; 
 so
 it might be desirable to not restrict the data type to just integers.

 -P

Re: About the Gauge metric API

2018-04-06 Thread Kenneth Knowles
In terms of natural language, I don't think "gauge" makes sense strings. A
gauge measures a quantity. A string is not a quantity. So I like a separate
API, like Robert says. Backends can go ahead and implement leaf String and
Gauge collectors with the same data structure if they like.

In implementation / reporting, it also may sometimes make sense to sum
gauges, possibly within a limited scope. Since gauge updates aren't
synchronized it would not be perfect but you could ballpark some quantity
across the fleet. This wouldn't make sense for strings. Just another
consequence of the fact that gauges measure a quantity/

On Fri, Apr 6, 2018 at 11:39 AM Bill Neubauer  wrote:

> Thanks for unraveling those themes, Pablo!
>
> 1. Seems reasonable in light of behaviors metrics backends support.
> 2. Those same backends support histogramming of data, so having integer
> types is very useful.
> 3. I believe that is the case, for the reasons I mentioned earlier, Gauges
> should only clobber previously values reported by the same entity. Two
> workers with the same gauge should not be overwriting each other's values,
> only their own. This implies per-worker segmentation.
>
>
> On Fri, Apr 6, 2018 at 11:35 AM Pablo Estrada  wrote:
>
>> Nobody wants to get rid of Gauges. I see that we have three separate
>> themes being discussed here, and I think it's useful to point them out and
>> address them independently:
>>
>> 1. Whether Gauges should change to hold string values.
>> 2. If Gauges are to support string values, whether Gauges should also
>> continue to have an int API.
>> 3. Whether Beam should support some sort of label / tag / worker-id for
>> aggregation of Gauges (maybe other metrics?)
>>
>> -P.
>>
>> On Fri, Apr 6, 2018 at 11:21 AM Ben Chambers 
>> wrote:
>>
>>> Gauges are incredibly useful for exposing the current state of the
>>> system. For instance, number of elements in a queue, current memory usage,
>>> number of RPCs in flight, etc. As mentioned above, these concepts exist in
>>> numerous systems for monitoring distributed environments, including
>>> Stackdriver Monitoring. The key to making them work is the addition of
>>> labels or tags, which as an aside are also useful for *all* metric types,
>>> not just Gauges.
>>>
>>> If Beam gets rid of Gauges, how would we go about exporting "current"
>>> values like memory usage, RPCs in flight, etc.?
>>>
>>> -- Ben
>>>
>>> On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles  wrote:
>>>
 Just naively - the use cases that Gauge addresses seem relevant, and
 the information seems feasible to gather and present. The bit that doesn't
 seem to make sense is aggregating gauges by clobbering each other. So I
 think that's just +1 Ben?

 On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi 
 wrote:

> I am not opposed to removing other data types, though they are extra
> convenience for user.
>
> In Scott's example above, if the metric is a counter, what are the
> guarantees provided? E.g. would it match the global count using GBK? If
> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
> backlog for each Kafka partition/split).
>
> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw 
> wrote:
>
>> A String API makes it clear(er) that the values will not be
>> aggregated in any way across workers. I don't think retaining both APIs
>> (except for possibly some short migration period) worthwhile. On another
>> note, I still find the distributed gague API to be a bit odd in general.
>>
>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi 
>> wrote:
>>
>>> I would be in favor of replacing the existing Gauge.set(long) API
 with the String version and removing the old one. This would be a 
 breaking
 change. However this is a relatively new API and is still marked
 @Experimental. Keeping the old API would retain the potential 
 confusion.
 It's better to simplify the API surface: having two APIs makes it less
 clear which one users should choose.
>>>
>>>
>>> Supporting additional data types sounds good. But the above states
>>> string API will replace the existing API. I do not see how string API 
>>> makes
>>> the semantics more clear.  Semantically both are same to the user.
>>>
>>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada 
>>> wrote:
>>>
 Hi Ben : D

 Sure, that's reasonable. And perhaps I started the discussion in
 the wrong direction. I'm not questioning the utility of Gauge metrics.

 What I'm saying is that Beam only supports integers,, but Gauges
 are aggregated by dropping old values depending on their update times; 
 so
 it might be desirable to not restrict the data type to just integers.

 -P.

 On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers 
 w

Re: About the Gauge metric API

2018-04-06 Thread Bill Neubauer
Thanks for unraveling those themes, Pablo!

1. Seems reasonable in light of behaviors metrics backends support.
2. Those same backends support histogramming of data, so having integer
types is very useful.
3. I believe that is the case, for the reasons I mentioned earlier, Gauges
should only clobber previously values reported by the same entity. Two
workers with the same gauge should not be overwriting each other's values,
only their own. This implies per-worker segmentation.


On Fri, Apr 6, 2018 at 11:35 AM Pablo Estrada  wrote:

> Nobody wants to get rid of Gauges. I see that we have three separate
> themes being discussed here, and I think it's useful to point them out and
> address them independently:
>
> 1. Whether Gauges should change to hold string values.
> 2. If Gauges are to support string values, whether Gauges should also
> continue to have an int API.
> 3. Whether Beam should support some sort of label / tag / worker-id for
> aggregation of Gauges (maybe other metrics?)
>
> -P.
>
> On Fri, Apr 6, 2018 at 11:21 AM Ben Chambers  wrote:
>
>> Gauges are incredibly useful for exposing the current state of the
>> system. For instance, number of elements in a queue, current memory usage,
>> number of RPCs in flight, etc. As mentioned above, these concepts exist in
>> numerous systems for monitoring distributed environments, including
>> Stackdriver Monitoring. The key to making them work is the addition of
>> labels or tags, which as an aside are also useful for *all* metric types,
>> not just Gauges.
>>
>> If Beam gets rid of Gauges, how would we go about exporting "current"
>> values like memory usage, RPCs in flight, etc.?
>>
>> -- Ben
>>
>> On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles  wrote:
>>
>>> Just naively - the use cases that Gauge addresses seem relevant, and the
>>> information seems feasible to gather and present. The bit that doesn't seem
>>> to make sense is aggregating gauges by clobbering each other. So I think
>>> that's just +1 Ben?
>>>
>>> On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi  wrote:
>>>
 I am not opposed to removing other data types, though they are extra
 convenience for user.

 In Scott's example above, if the metric is a counter, what are the
 guarantees provided? E.g. would it match the global count using GBK? If
 yes, then gauges (especially per-key gauges) can be very useful too (e.g.
 backlog for each Kafka partition/split).

 On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw 
 wrote:

> A String API makes it clear(er) that the values will not be aggregated
> in any way across workers. I don't think retaining both APIs (except for
> possibly some short migration period) worthwhile. On another note, I still
> find the distributed gague API to be a bit odd in general.
>
> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi 
> wrote:
>
>> I would be in favor of replacing the existing Gauge.set(long) API
>>> with the String version and removing the old one. This would be a 
>>> breaking
>>> change. However this is a relatively new API and is still marked
>>> @Experimental. Keeping the old API would retain the potential confusion.
>>> It's better to simplify the API surface: having two APIs makes it less
>>> clear which one users should choose.
>>
>>
>> Supporting additional data types sounds good. But the above states
>> string API will replace the existing API. I do not see how string API 
>> makes
>> the semantics more clear.  Semantically both are same to the user.
>>
>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada 
>> wrote:
>>
>>> Hi Ben : D
>>>
>>> Sure, that's reasonable. And perhaps I started the discussion in the
>>> wrong direction. I'm not questioning the utility of Gauge metrics.
>>>
>>> What I'm saying is that Beam only supports integers,, but Gauges are
>>> aggregated by dropping old values depending on their update times; so it
>>> might be desirable to not restrict the data type to just integers.
>>>
>>> -P.
>>>
>>> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers 
>>> wrote:
>>>
 See for instance how gauge metrics are handled in Prometheus,
 Datadog and Stackdriver monitoring. Gauges are perfect for use in
 distributed systems, they just need to be properly labeled. Perhaps we
 should apply a default tag or allow users to specify one.

 On Fri, Apr 6, 2018, 9:14 AM Ben Chambers 
 wrote:

> Some metrics backend label the value, for instance with the worker
> that sent it. Then the aggregation is latest per label. This makes it
> useful for holding values such as "memory usage" that need to hold 
> current
> value.
>
> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner 
> wrote:
>
>> +1 on the proposal to support a "String" gauge.

Re: About the Gauge metric API

2018-04-06 Thread Pablo Estrada
Nobody wants to get rid of Gauges. I see that we have three separate themes
being discussed here, and I think it's useful to point them out and address
them independently:

1. Whether Gauges should change to hold string values.
2. If Gauges are to support string values, whether Gauges should also
continue to have an int API.
3. Whether Beam should support some sort of label / tag / worker-id for
aggregation of Gauges (maybe other metrics?)

-P.

On Fri, Apr 6, 2018 at 11:21 AM Ben Chambers  wrote:

> Gauges are incredibly useful for exposing the current state of the system.
> For instance, number of elements in a queue, current memory usage, number
> of RPCs in flight, etc. As mentioned above, these concepts exist in
> numerous systems for monitoring distributed environments, including
> Stackdriver Monitoring. The key to making them work is the addition of
> labels or tags, which as an aside are also useful for *all* metric types,
> not just Gauges.
>
> If Beam gets rid of Gauges, how would we go about exporting "current"
> values like memory usage, RPCs in flight, etc.?
>
> -- Ben
>
> On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles  wrote:
>
>> Just naively - the use cases that Gauge addresses seem relevant, and the
>> information seems feasible to gather and present. The bit that doesn't seem
>> to make sense is aggregating gauges by clobbering each other. So I think
>> that's just +1 Ben?
>>
>> On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi  wrote:
>>
>>> I am not opposed to removing other data types, though they are extra
>>> convenience for user.
>>>
>>> In Scott's example above, if the metric is a counter, what are the
>>> guarantees provided? E.g. would it match the global count using GBK? If
>>> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
>>> backlog for each Kafka partition/split).
>>>
>>> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw 
>>> wrote:
>>>
 A String API makes it clear(er) that the values will not be aggregated
 in any way across workers. I don't think retaining both APIs (except for
 possibly some short migration period) worthwhile. On another note, I still
 find the distributed gague API to be a bit odd in general.

 On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi  wrote:

> I would be in favor of replacing the existing Gauge.set(long) API with
>> the String version and removing the old one. This would be a breaking
>> change. However this is a relatively new API and is still marked
>> @Experimental. Keeping the old API would retain the potential confusion.
>> It's better to simplify the API surface: having two APIs makes it less
>> clear which one users should choose.
>
>
> Supporting additional data types sounds good. But the above states
> string API will replace the existing API. I do not see how string API 
> makes
> the semantics more clear.  Semantically both are same to the user.
>
> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada 
> wrote:
>
>> Hi Ben : D
>>
>> Sure, that's reasonable. And perhaps I started the discussion in the
>> wrong direction. I'm not questioning the utility of Gauge metrics.
>>
>> What I'm saying is that Beam only supports integers,, but Gauges are
>> aggregated by dropping old values depending on their update times; so it
>> might be desirable to not restrict the data type to just integers.
>>
>> -P.
>>
>> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers 
>> wrote:
>>
>>> See for instance how gauge metrics are handled in Prometheus,
>>> Datadog and Stackdriver monitoring. Gauges are perfect for use in
>>> distributed systems, they just need to be properly labeled. Perhaps we
>>> should apply a default tag or allow users to specify one.
>>>
>>> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers 
>>> wrote:
>>>
 Some metrics backend label the value, for instance with the worker
 that sent it. Then the aggregation is latest per label. This makes it
 useful for holding values such as "memory usage" that need to hold 
 current
 value.

 On Fri, Apr 6, 2018, 9:00 AM Scott Wegner 
 wrote:

> +1 on the proposal to support a "String" gauge.
>
> To expand a bit, the current API doesn't make it clear that the
> gauge value is based on local state. If a runner chooses to 
> parallelize a
> DoFn across many workers, each worker will have its own local Gauge 
> metric
> and its updates will overwrite other values. For example, from the 
> API it
> looks like you could use a gauge to implement your own element count 
> metric:
>
> long count = 0;
> @ProcessElement
> public void processElement(ProcessContext c) {
>   myGauge.set(++count);
>   c.output(c.element());
> }
>

Re: building on top of filesystem, can beam help?

2018-04-06 Thread Reuven Lax
Personally, this is a case where I think forking might be a better option,
even though I'm not generally a fan of duplicating code.

In past projects, depending on internal modules of other projects never
seems lead to good outcomes. FileSystem exists for Beam today, and Beam
might make changes to it that cause problems for your other project. I
would recommend starting by forking if it serves your needs.

Reuven

On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau 
wrote:

> Hi guys,
>
> I have a use case where I'd like to be able to expose to a user some file
> system navigation and enable him to visualize the file system (as in beam
> sense)
>
> Technically it is a matter of being able to use glob pattern to browse the
> file system using match(specs).
>
> What is important in that use case is to align the visualization and the
> potential runtime to have the same impl/view and not have to split it in 2
> code branches which can lead to inconsistency.
>
> Therefore i'd like to be able to reuse beam FileSystem but I have a few
> blockers:
>
> 1. it is nested in sdk-java-core which brings 2 drawbacks
> a. it brings the whole beam sdk which is not desired in that part of the
> app (should not be visible in the classpath)
> b. the dependency stack is just unpractiable (guava, jackson, byte-buddy,
> avro, joda, at least, are not desired at all here) and a shade makes it way
> too fat to be a valid dependency for that usage
> 2. I don't know how to configure the FS from one of its instance (I'd like
> to be able to get its options class like FileSystem#getConfigurationType
> returning a PipelineOptions)
>
> Do you think it is possible to extract the filesystem API in a dependency
> free beam subproject (or at least submodule) and add the configuration hint
> in the API?
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>


Re: Gradle Status [April 6]

2018-04-06 Thread Romain Manni-Bucau
Why building a zip per runner which its stack and just pointing out on that
zip and let beam lazy load the runner:

--runner=LazyRunner --lazyRunnerDir=... --lazyRunnerOptions=... (or the
fromSystemProperties() if it gets merged a day ;))

Le 6 avr. 2018 20:21, "Kenneth Knowles"  a écrit :

> I'm working on finding a solution for launching the Nexmark suite with
> each runner. This doesn't have to be done via Gradle, but we anyhow need
> built artifacts that don't require user classpath intervention.
>
> It looks to me like the examples are also missing this - they have
> separate configuration e.g. sparkRunnerPreCommit but that is overspecified
> compared to a free-form launching of a main() program with a runner profile.
>
> On Fri, Apr 6, 2018 at 11:09 AM Lukasz Cwik  wrote:
>
>> Romain, are you talking about the profiles that exist as part of the
>> archetype examples?
>>
>> If so, then those still exist and haven't been changed. If not, can you
>> provide a link to the profile in a pom file to be clearer?
>>
>> On Fri, Apr 6, 2018 at 12:40 PM Romain Manni-Bucau 
>> wrote:
>>
>>> Hi Scott,
>>>
>>> is it right that 2 doesn't handle the hierachy anymore and that it
>>> doesn't handle profiles for runners as it is currently with maven?
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>>  | LinkedIn
>>>  | Book
>>> 
>>>
>>> 2018-04-06 18:32 GMT+02:00 Scott Wegner :
>>>
 I wanted to start a thread to summarize the current state of Gradle
 migration. We've made lots of good progress so far this week. Here's the
 status from what I can tell-- please add or correct anything I missed:

 * Release artifacts can be built and published for Snapshot and
 officlal releases [1]
 * Gradle-generated releases have been validated with the the Apache
 Beam archetype generation quickstart; still needs additional validation.
 * Generated release pom files have correct project metadata [2]
 * The python pre-commits are now working in Gradle [3]
 * Ismaël has started a collaborative doc of Gradle tips [4] as we all
 learn the new system-- please add your own. This will eventually feed into
 official documentation on the website.
 * Łukasz Gajowy is working on migrating performance testing framework
 [5]
 * Daniel is working on updating documentation to refer to Gradle
 instead of maven

 If I missed anything, please add it to this thread.

 The general roadmap we're working towards is:
 (a) Publish release artifacts with Gradle (SNAPSHOT and signed releases)
 (b) Postcommits migrated to Gradle
 (c) Migrate documentation from maven to Gradle
 (d) Migrate perfkit suites to use Gradle

 For those of you that are hacking: thanks for your help so far!
 Progress is being roughly tracked on the Kanban [6]; please make sure the
 issues assigned to you are up-to-date. Many of the changes are staged on
 lukecwik's local branch [7]; we'll work on merging them back soon.


 [1] https://github.com/lukecwik/incubator-beam/pull/7
 [2] https://github.com/lukecwik/incubator-beam/pull/3
 [3] https://github.com/apache/beam/pull/5032
 [4] https://docs.google.com/document/d/1wR56Jef3XIPwj4DFzQKznuGPM3JDf
 RDVkxzeDlbdVSQ/edit
 [5] https://github.com/apache/beam/pull/5003
 [6] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242

 [7] https://github.com/lukecwik/incubator-beam/tree/gradle
 --


 Got feedback? http://go/swegner-feedback

>>>
>>>


Re: Gradle Tips and tricks

2018-04-06 Thread Reuven Lax
I think Scott just sent a summary.

I agree, even when all coding work is done, the migration isn't done until
the contribution guide (and any other documentation) is updated.

On Fri, Apr 6, 2018 at 1:07 AM Jean-Baptiste Onofré  wrote:

> Agree, it's what I mentioned this morning on Slack.
>
> I think it would be great to have a summary of the current state on the dev
> mailing list.
>
> At the end of the day, the contribution guide should be updated (we have a
> Jira
> about that afair).
>
> Regards
> JB
>
> On 04/06/2018 09:03 AM, Ismaël Mejía wrote:
> > After some discussion on slack it is clear that we need to document
> > some of the gradle replacements of our common maven commands. We
> > started a shared doc yesterday to share some of those and other gradle
> > tips and tricks. I invite everyone who can help to add their favorite
> > gradle 'incantations' there and other related knowledge. We will
> > migrate this info to the website afterwards.
> >
> > https://s.apache.org/beam-gradle-tips-edit
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Gradle Status [April 6]

2018-04-06 Thread Romain Manni-Bucau
Le 6 avr. 2018 20:09, "Lukasz Cwik"  a écrit :

Romain, are you talking about the profiles that exist as part of the
archetype examples?


Was more thinking to this kind of profiles
https://github.com/apache/beam/blob/master/sdks/java/nexmark/pom.xml (which
should hit all IO at some point to ensure their portability)

Idea is to be able to extract the deps for a runner from a particular pom
since it sometimes requires some dependencies work for conflicts.





If so, then those still exist and haven't been changed. If not, can you
provide a link to the profile in a pom file to be clearer?

On Fri, Apr 6, 2018 at 12:40 PM Romain Manni-Bucau 
wrote:

> Hi Scott,
>
> is it right that 2 doesn't handle the hierachy anymore and that it doesn't
> handle profiles for runners as it is currently with maven?
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-04-06 18:32 GMT+02:00 Scott Wegner :
>
>> I wanted to start a thread to summarize the current state of Gradle
>> migration. We've made lots of good progress so far this week. Here's the
>> status from what I can tell-- please add or correct anything I missed:
>>
>> * Release artifacts can be built and published for Snapshot and officlal
>> releases [1]
>> * Gradle-generated releases have been validated with the the Apache Beam
>> archetype generation quickstart; still needs additional validation.
>> * Generated release pom files have correct project metadata [2]
>> * The python pre-commits are now working in Gradle [3]
>> * Ismaël has started a collaborative doc of Gradle tips [4] as we all
>> learn the new system-- please add your own. This will eventually feed into
>> official documentation on the website.
>> * Łukasz Gajowy is working on migrating performance testing framework [5]
>> * Daniel is working on updating documentation to refer to Gradle instead
>> of maven
>>
>> If I missed anything, please add it to this thread.
>>
>> The general roadmap we're working towards is:
>> (a) Publish release artifacts with Gradle (SNAPSHOT and signed releases)
>> (b) Postcommits migrated to Gradle
>> (c) Migrate documentation from maven to Gradle
>> (d) Migrate perfkit suites to use Gradle
>>
>> For those of you that are hacking: thanks for your help so far! Progress
>> is being roughly tracked on the Kanban [6]; please make sure the issues
>> assigned to you are up-to-date. Many of the changes are staged on
>> lukecwik's local branch [7]; we'll work on merging them back soon.
>>
>>
>> [1] https://github.com/lukecwik/incubator-beam/pull/7
>> [2] https://github.com/lukecwik/incubator-beam/pull/3
>> [3] https://github.com/apache/beam/pull/5032
>> [4] https://docs.google.com/document/d/1wR56Jef3XIPwj4DFzQKznuGPM3JDf
>> RDVkxzeDlbdVSQ/edit
>> [5] https://github.com/apache/beam/pull/5003
>> [6] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242
>> [7] https://github.com/lukecwik/incubator-beam/tree/gradle
>> --
>>
>>
>> Got feedback? http://go/swegner-feedback
>>
>
>


Re: About the Gauge metric API

2018-04-06 Thread Ben Chambers
Gauges are incredibly useful for exposing the current state of the system.
For instance, number of elements in a queue, current memory usage, number
of RPCs in flight, etc. As mentioned above, these concepts exist in
numerous systems for monitoring distributed environments, including
Stackdriver Monitoring. The key to making them work is the addition of
labels or tags, which as an aside are also useful for *all* metric types,
not just Gauges.

If Beam gets rid of Gauges, how would we go about exporting "current"
values like memory usage, RPCs in flight, etc.?

-- Ben


On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles  wrote:

> Just naively - the use cases that Gauge addresses seem relevant, and the
> information seems feasible to gather and present. The bit that doesn't seem
> to make sense is aggregating gauges by clobbering each other. So I think
> that's just +1 Ben?
>
> On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi  wrote:
>
>> I am not opposed to removing other data types, though they are extra
>> convenience for user.
>>
>> In Scott's example above, if the metric is a counter, what are the
>> guarantees provided? E.g. would it match the global count using GBK? If
>> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
>> backlog for each Kafka partition/split).
>>
>> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw 
>> wrote:
>>
>>> A String API makes it clear(er) that the values will not be aggregated
>>> in any way across workers. I don't think retaining both APIs (except for
>>> possibly some short migration period) worthwhile. On another note, I still
>>> find the distributed gague API to be a bit odd in general.
>>>
>>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi  wrote:
>>>
 I would be in favor of replacing the existing Gauge.set(long) API with
> the String version and removing the old one. This would be a breaking
> change. However this is a relatively new API and is still marked
> @Experimental. Keeping the old API would retain the potential confusion.
> It's better to simplify the API surface: having two APIs makes it less
> clear which one users should choose.


 Supporting additional data types sounds good. But the above states
 string API will replace the existing API. I do not see how string API makes
 the semantics more clear.  Semantically both are same to the user.

 On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada 
 wrote:

> Hi Ben : D
>
> Sure, that's reasonable. And perhaps I started the discussion in the
> wrong direction. I'm not questioning the utility of Gauge metrics.
>
> What I'm saying is that Beam only supports integers,, but Gauges are
> aggregated by dropping old values depending on their update times; so it
> might be desirable to not restrict the data type to just integers.
>
> -P.
>
> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers 
> wrote:
>
>> See for instance how gauge metrics are handled in Prometheus, Datadog
>> and Stackdriver monitoring. Gauges are perfect for use in distributed
>> systems, they just need to be properly labeled. Perhaps we should apply a
>> default tag or allow users to specify one.
>>
>> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers 
>> wrote:
>>
>>> Some metrics backend label the value, for instance with the worker
>>> that sent it. Then the aggregation is latest per label. This makes it
>>> useful for holding values such as "memory usage" that need to hold 
>>> current
>>> value.
>>>
>>> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner 
>>> wrote:
>>>
 +1 on the proposal to support a "String" gauge.

 To expand a bit, the current API doesn't make it clear that the
 gauge value is based on local state. If a runner chooses to 
 parallelize a
 DoFn across many workers, each worker will have its own local Gauge 
 metric
 and its updates will overwrite other values. For example, from the API 
 it
 looks like you could use a gauge to implement your own element count 
 metric:

 long count = 0;
 @ProcessElement
 public void processElement(ProcessContext c) {
   myGauge.set(++count);
   c.output(c.element());
 }

 This looks correct, but each worker has their own local 'count'
 field, and gauge metric updates from parallel workers will overwrite 
 each
 other rather than get aggregated. So the final value would be "the 
 number
 of elements processed on one of the workers". (The correct 
 implementation
 uses a Counter metric).

 I would be in favor of replacing the existing Gauge.set(long) API
 with the String version and removing the old one. This would be a 
 breaking
 change. However this is a relatively new API 

Re: Gradle Status [April 6]

2018-04-06 Thread Kenneth Knowles
I'm working on finding a solution for launching the Nexmark suite with each
runner. This doesn't have to be done via Gradle, but we anyhow need built
artifacts that don't require user classpath intervention.

It looks to me like the examples are also missing this - they have separate
configuration e.g. sparkRunnerPreCommit but that is overspecified compared
to a free-form launching of a main() program with a runner profile.

On Fri, Apr 6, 2018 at 11:09 AM Lukasz Cwik  wrote:

> Romain, are you talking about the profiles that exist as part of the
> archetype examples?
>
> If so, then those still exist and haven't been changed. If not, can you
> provide a link to the profile in a pom file to be clearer?
>
> On Fri, Apr 6, 2018 at 12:40 PM Romain Manni-Bucau 
> wrote:
>
>> Hi Scott,
>>
>> is it right that 2 doesn't handle the hierachy anymore and that it
>> doesn't handle profiles for runners as it is currently with maven?
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-04-06 18:32 GMT+02:00 Scott Wegner :
>>
>>> I wanted to start a thread to summarize the current state of Gradle
>>> migration. We've made lots of good progress so far this week. Here's the
>>> status from what I can tell-- please add or correct anything I missed:
>>>
>>> * Release artifacts can be built and published for Snapshot and officlal
>>> releases [1]
>>> * Gradle-generated releases have been validated with the the Apache Beam
>>> archetype generation quickstart; still needs additional validation.
>>> * Generated release pom files have correct project metadata [2]
>>> * The python pre-commits are now working in Gradle [3]
>>> * Ismaël has started a collaborative doc of Gradle tips [4] as we all
>>> learn the new system-- please add your own. This will eventually feed into
>>> official documentation on the website.
>>> * Łukasz Gajowy is working on migrating performance testing framework [5]
>>> * Daniel is working on updating documentation to refer to Gradle instead
>>> of maven
>>>
>>> If I missed anything, please add it to this thread.
>>>
>>> The general roadmap we're working towards is:
>>> (a) Publish release artifacts with Gradle (SNAPSHOT and signed releases)
>>> (b) Postcommits migrated to Gradle
>>> (c) Migrate documentation from maven to Gradle
>>> (d) Migrate perfkit suites to use Gradle
>>>
>>> For those of you that are hacking: thanks for your help so far! Progress
>>> is being roughly tracked on the Kanban [6]; please make sure the issues
>>> assigned to you are up-to-date. Many of the changes are staged on
>>> lukecwik's local branch [7]; we'll work on merging them back soon.
>>>
>>>
>>> [1] https://github.com/lukecwik/incubator-beam/pull/7
>>> [2] https://github.com/lukecwik/incubator-beam/pull/3
>>> [3] https://github.com/apache/beam/pull/5032
>>> [4]
>>> https://docs.google.com/document/d/1wR56Jef3XIPwj4DFzQKznuGPM3JDfRDVkxzeDlbdVSQ/edit
>>> [5] https://github.com/apache/beam/pull/5003
>>> [6] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242
>>> [7] https://github.com/lukecwik/incubator-beam/tree/gradle
>>> --
>>>
>>>
>>> Got feedback? http://go/swegner-feedback
>>>
>>
>>


Re: About the Gauge metric API

2018-04-06 Thread Bill Neubauer
A gauge API is only useful if there's a correlation to a distributed
worker, because "clobber" is not a useful aggregation method. There's no
useful correlation of that signal to anything actionable. Ben's already
noted this point that metrics backends do this, but it seems that if gauge
is to be a first-class metric in Beam, Beam also needs a first-class worker
identifier, which is currently missing.


On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles  wrote:

> Just naively - the use cases that Gauge addresses seem relevant, and the
> information seems feasible to gather and present. The bit that doesn't seem
> to make sense is aggregating gauges by clobbering each other. So I think
> that's just +1 Ben?
>
> On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi  wrote:
>
>> I am not opposed to removing other data types, though they are extra
>> convenience for user.
>>
>> In Scott's example above, if the metric is a counter, what are the
>> guarantees provided? E.g. would it match the global count using GBK? If
>> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
>> backlog for each Kafka partition/split).
>>
>> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw 
>> wrote:
>>
>>> A String API makes it clear(er) that the values will not be aggregated
>>> in any way across workers. I don't think retaining both APIs (except for
>>> possibly some short migration period) worthwhile. On another note, I still
>>> find the distributed gague API to be a bit odd in general.
>>>
>>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi  wrote:
>>>
 I would be in favor of replacing the existing Gauge.set(long) API with
> the String version and removing the old one. This would be a breaking
> change. However this is a relatively new API and is still marked
> @Experimental. Keeping the old API would retain the potential confusion.
> It's better to simplify the API surface: having two APIs makes it less
> clear which one users should choose.


 Supporting additional data types sounds good. But the above states
 string API will replace the existing API. I do not see how string API makes
 the semantics more clear.  Semantically both are same to the user.

 On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada 
 wrote:

> Hi Ben : D
>
> Sure, that's reasonable. And perhaps I started the discussion in the
> wrong direction. I'm not questioning the utility of Gauge metrics.
>
> What I'm saying is that Beam only supports integers,, but Gauges are
> aggregated by dropping old values depending on their update times; so it
> might be desirable to not restrict the data type to just integers.
>
> -P.
>
> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers 
> wrote:
>
>> See for instance how gauge metrics are handled in Prometheus, Datadog
>> and Stackdriver monitoring. Gauges are perfect for use in distributed
>> systems, they just need to be properly labeled. Perhaps we should apply a
>> default tag or allow users to specify one.
>>
>> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers 
>> wrote:
>>
>>> Some metrics backend label the value, for instance with the worker
>>> that sent it. Then the aggregation is latest per label. This makes it
>>> useful for holding values such as "memory usage" that need to hold 
>>> current
>>> value.
>>>
>>> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner 
>>> wrote:
>>>
 +1 on the proposal to support a "String" gauge.

 To expand a bit, the current API doesn't make it clear that the
 gauge value is based on local state. If a runner chooses to 
 parallelize a
 DoFn across many workers, each worker will have its own local Gauge 
 metric
 and its updates will overwrite other values. For example, from the API 
 it
 looks like you could use a gauge to implement your own element count 
 metric:

 long count = 0;
 @ProcessElement
 public void processElement(ProcessContext c) {
   myGauge.set(++count);
   c.output(c.element());
 }

 This looks correct, but each worker has their own local 'count'
 field, and gauge metric updates from parallel workers will overwrite 
 each
 other rather than get aggregated. So the final value would be "the 
 number
 of elements processed on one of the workers". (The correct 
 implementation
 uses a Counter metric).

 I would be in favor of replacing the existing Gauge.set(long) API
 with the String version and removing the old one. This would be a 
 breaking
 change. However this is a relatively new API and is still marked
 @Experimental. Keeping the old API would retain the potential 
 confusion.
 It's better to simplify the API surface: having 

Re: About the Gauge metric API

2018-04-06 Thread Kenneth Knowles
Just naively - the use cases that Gauge addresses seem relevant, and the
information seems feasible to gather and present. The bit that doesn't seem
to make sense is aggregating gauges by clobbering each other. So I think
that's just +1 Ben?

On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi  wrote:

> I am not opposed to removing other data types, though they are extra
> convenience for user.
>
> In Scott's example above, if the metric is a counter, what are the
> guarantees provided? E.g. would it match the global count using GBK? If
> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
> backlog for each Kafka partition/split).
>
> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw 
> wrote:
>
>> A String API makes it clear(er) that the values will not be aggregated in
>> any way across workers. I don't think retaining both APIs (except for
>> possibly some short migration period) worthwhile. On another note, I still
>> find the distributed gague API to be a bit odd in general.
>>
>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi  wrote:
>>
>>> I would be in favor of replacing the existing Gauge.set(long) API with
 the String version and removing the old one. This would be a breaking
 change. However this is a relatively new API and is still marked
 @Experimental. Keeping the old API would retain the potential confusion.
 It's better to simplify the API surface: having two APIs makes it less
 clear which one users should choose.
>>>
>>>
>>> Supporting additional data types sounds good. But the above states
>>> string API will replace the existing API. I do not see how string API makes
>>> the semantics more clear.  Semantically both are same to the user.
>>>
>>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada  wrote:
>>>
 Hi Ben : D

 Sure, that's reasonable. And perhaps I started the discussion in the
 wrong direction. I'm not questioning the utility of Gauge metrics.

 What I'm saying is that Beam only supports integers,, but Gauges are
 aggregated by dropping old values depending on their update times; so it
 might be desirable to not restrict the data type to just integers.

 -P.

 On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers 
 wrote:

> See for instance how gauge metrics are handled in Prometheus, Datadog
> and Stackdriver monitoring. Gauges are perfect for use in distributed
> systems, they just need to be properly labeled. Perhaps we should apply a
> default tag or allow users to specify one.
>
> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers 
> wrote:
>
>> Some metrics backend label the value, for instance with the worker
>> that sent it. Then the aggregation is latest per label. This makes it
>> useful for holding values such as "memory usage" that need to hold 
>> current
>> value.
>>
>> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner  wrote:
>>
>>> +1 on the proposal to support a "String" gauge.
>>>
>>> To expand a bit, the current API doesn't make it clear that the
>>> gauge value is based on local state. If a runner chooses to parallelize 
>>> a
>>> DoFn across many workers, each worker will have its own local Gauge 
>>> metric
>>> and its updates will overwrite other values. For example, from the API 
>>> it
>>> looks like you could use a gauge to implement your own element count 
>>> metric:
>>>
>>> long count = 0;
>>> @ProcessElement
>>> public void processElement(ProcessContext c) {
>>>   myGauge.set(++count);
>>>   c.output(c.element());
>>> }
>>>
>>> This looks correct, but each worker has their own local 'count'
>>> field, and gauge metric updates from parallel workers will overwrite 
>>> each
>>> other rather than get aggregated. So the final value would be "the 
>>> number
>>> of elements processed on one of the workers". (The correct 
>>> implementation
>>> uses a Counter metric).
>>>
>>> I would be in favor of replacing the existing Gauge.set(long) API
>>> with the String version and removing the old one. This would be a 
>>> breaking
>>> change. However this is a relatively new API and is still marked
>>> @Experimental. Keeping the old API would retain the potential confusion.
>>> It's better to simplify the API surface: having two APIs makes it less
>>> clear which one users should choose.
>>>
>>> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada 
>>> wrote:
>>>
 Hello all,
 As I was working on adding support for Gauges in Dataflow, some
 noted that Gauge is a fairly unusual kind of metric for a distributed
 environment, since many workers will report different values and stomp 
 on
 each other's all the time.

 We also looked at Flink and Dropwizard Gauge metrics [1][2], and we
 found that these use generics, and Flink expli

Re: Gradle Status [April 6]

2018-04-06 Thread Lukasz Cwik
Romain, are you talking about the profiles that exist as part of the
archetype examples?

If so, then those still exist and haven't been changed. If not, can you
provide a link to the profile in a pom file to be clearer?

On Fri, Apr 6, 2018 at 12:40 PM Romain Manni-Bucau 
wrote:

> Hi Scott,
>
> is it right that 2 doesn't handle the hierachy anymore and that it doesn't
> handle profiles for runners as it is currently with maven?
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-04-06 18:32 GMT+02:00 Scott Wegner :
>
>> I wanted to start a thread to summarize the current state of Gradle
>> migration. We've made lots of good progress so far this week. Here's the
>> status from what I can tell-- please add or correct anything I missed:
>>
>> * Release artifacts can be built and published for Snapshot and officlal
>> releases [1]
>> * Gradle-generated releases have been validated with the the Apache Beam
>> archetype generation quickstart; still needs additional validation.
>> * Generated release pom files have correct project metadata [2]
>> * The python pre-commits are now working in Gradle [3]
>> * Ismaël has started a collaborative doc of Gradle tips [4] as we all
>> learn the new system-- please add your own. This will eventually feed into
>> official documentation on the website.
>> * Łukasz Gajowy is working on migrating performance testing framework [5]
>> * Daniel is working on updating documentation to refer to Gradle instead
>> of maven
>>
>> If I missed anything, please add it to this thread.
>>
>> The general roadmap we're working towards is:
>> (a) Publish release artifacts with Gradle (SNAPSHOT and signed releases)
>> (b) Postcommits migrated to Gradle
>> (c) Migrate documentation from maven to Gradle
>> (d) Migrate perfkit suites to use Gradle
>>
>> For those of you that are hacking: thanks for your help so far! Progress
>> is being roughly tracked on the Kanban [6]; please make sure the issues
>> assigned to you are up-to-date. Many of the changes are staged on
>> lukecwik's local branch [7]; we'll work on merging them back soon.
>>
>>
>> [1] https://github.com/lukecwik/incubator-beam/pull/7
>> [2] https://github.com/lukecwik/incubator-beam/pull/3
>> [3] https://github.com/apache/beam/pull/5032
>> [4]
>> https://docs.google.com/document/d/1wR56Jef3XIPwj4DFzQKznuGPM3JDfRDVkxzeDlbdVSQ/edit
>> [5] https://github.com/apache/beam/pull/5003
>> [6] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242
>> [7] https://github.com/lukecwik/incubator-beam/tree/gradle
>> --
>>
>>
>> Got feedback? http://go/swegner-feedback
>>
>
>


Re: About the Gauge metric API

2018-04-06 Thread Raghu Angadi
I am not opposed to removing other data types, though they are extra
convenience for user.

In Scott's example above, if the metric is a counter, what are the
guarantees provided? E.g. would it match the global count using GBK? If
yes, then gauges (especially per-key gauges) can be very useful too (e.g.
backlog for each Kafka partition/split).

On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw  wrote:

> A String API makes it clear(er) that the values will not be aggregated in
> any way across workers. I don't think retaining both APIs (except for
> possibly some short migration period) worthwhile. On another note, I still
> find the distributed gague API to be a bit odd in general.
>
> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi  wrote:
>
>> I would be in favor of replacing the existing Gauge.set(long) API with
>>> the String version and removing the old one. This would be a breaking
>>> change. However this is a relatively new API and is still marked
>>> @Experimental. Keeping the old API would retain the potential confusion.
>>> It's better to simplify the API surface: having two APIs makes it less
>>> clear which one users should choose.
>>
>>
>> Supporting additional data types sounds good. But the above states string
>> API will replace the existing API. I do not see how string API makes the
>> semantics more clear.  Semantically both are same to the user.
>>
>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada  wrote:
>>
>>> Hi Ben : D
>>>
>>> Sure, that's reasonable. And perhaps I started the discussion in the
>>> wrong direction. I'm not questioning the utility of Gauge metrics.
>>>
>>> What I'm saying is that Beam only supports integers,, but Gauges are
>>> aggregated by dropping old values depending on their update times; so it
>>> might be desirable to not restrict the data type to just integers.
>>>
>>> -P.
>>>
>>> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers 
>>> wrote:
>>>
 See for instance how gauge metrics are handled in Prometheus, Datadog
 and Stackdriver monitoring. Gauges are perfect for use in distributed
 systems, they just need to be properly labeled. Perhaps we should apply a
 default tag or allow users to specify one.

 On Fri, Apr 6, 2018, 9:14 AM Ben Chambers  wrote:

> Some metrics backend label the value, for instance with the worker
> that sent it. Then the aggregation is latest per label. This makes it
> useful for holding values such as "memory usage" that need to hold current
> value.
>
> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner  wrote:
>
>> +1 on the proposal to support a "String" gauge.
>>
>> To expand a bit, the current API doesn't make it clear that the gauge
>> value is based on local state. If a runner chooses to parallelize a DoFn
>> across many workers, each worker will have its own local Gauge metric and
>> its updates will overwrite other values. For example, from the API it 
>> looks
>> like you could use a gauge to implement your own element count metric:
>>
>> long count = 0;
>> @ProcessElement
>> public void processElement(ProcessContext c) {
>>   myGauge.set(++count);
>>   c.output(c.element());
>> }
>>
>> This looks correct, but each worker has their own local 'count'
>> field, and gauge metric updates from parallel workers will overwrite each
>> other rather than get aggregated. So the final value would be "the number
>> of elements processed on one of the workers". (The correct implementation
>> uses a Counter metric).
>>
>> I would be in favor of replacing the existing Gauge.set(long) API
>> with the String version and removing the old one. This would be a 
>> breaking
>> change. However this is a relatively new API and is still marked
>> @Experimental. Keeping the old API would retain the potential confusion.
>> It's better to simplify the API surface: having two APIs makes it less
>> clear which one users should choose.
>>
>> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada 
>> wrote:
>>
>>> Hello all,
>>> As I was working on adding support for Gauges in Dataflow, some
>>> noted that Gauge is a fairly unusual kind of metric for a distributed
>>> environment, since many workers will report different values and stomp 
>>> on
>>> each other's all the time.
>>>
>>> We also looked at Flink and Dropwizard Gauge metrics [1][2], and we
>>> found that these use generics, and Flink explicitly mentions that a
>>> toString implementation is required[3].
>>>
>>> With that in mind, I'm thinking that it might make sense to 1)
>>> expand Gauge to support string values (keep int-based API for backwards
>>> compatibility), and migrate it to use string behind the covers.
>>>
>>> What does everyone think about this?
>>>
>>> Best
>>> -P.
>>>
>>> 1 -
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/

Re: About the Gauge metric API

2018-04-06 Thread Robert Bradshaw
A String API makes it clear(er) that the values will not be aggregated in
any way across workers. I don't think retaining both APIs (except for
possibly some short migration period) worthwhile. On another note, I still
find the distributed gague API to be a bit odd in general.

On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi  wrote:

> I would be in favor of replacing the existing Gauge.set(long) API with the
>> String version and removing the old one. This would be a breaking change.
>> However this is a relatively new API and is still marked @Experimental.
>> Keeping the old API would retain the potential confusion. It's better to
>> simplify the API surface: having two APIs makes it less clear which one
>> users should choose.
>
>
> Supporting additional data types sounds good. But the above states string
> API will replace the existing API. I do not see how string API makes the
> semantics more clear.  Semantically both are same to the user.
>
> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada  wrote:
>
>> Hi Ben : D
>>
>> Sure, that's reasonable. And perhaps I started the discussion in the
>> wrong direction. I'm not questioning the utility of Gauge metrics.
>>
>> What I'm saying is that Beam only supports integers,, but Gauges are
>> aggregated by dropping old values depending on their update times; so it
>> might be desirable to not restrict the data type to just integers.
>>
>> -P.
>>
>> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers  wrote:
>>
>>> See for instance how gauge metrics are handled in Prometheus, Datadog
>>> and Stackdriver monitoring. Gauges are perfect for use in distributed
>>> systems, they just need to be properly labeled. Perhaps we should apply a
>>> default tag or allow users to specify one.
>>>
>>> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers  wrote:
>>>
 Some metrics backend label the value, for instance with the worker that
 sent it. Then the aggregation is latest per label. This makes it useful for
 holding values such as "memory usage" that need to hold current value.

 On Fri, Apr 6, 2018, 9:00 AM Scott Wegner  wrote:

> +1 on the proposal to support a "String" gauge.
>
> To expand a bit, the current API doesn't make it clear that the gauge
> value is based on local state. If a runner chooses to parallelize a DoFn
> across many workers, each worker will have its own local Gauge metric and
> its updates will overwrite other values. For example, from the API it 
> looks
> like you could use a gauge to implement your own element count metric:
>
> long count = 0;
> @ProcessElement
> public void processElement(ProcessContext c) {
>   myGauge.set(++count);
>   c.output(c.element());
> }
>
> This looks correct, but each worker has their own local 'count' field,
> and gauge metric updates from parallel workers will overwrite each other
> rather than get aggregated. So the final value would be "the number of
> elements processed on one of the workers". (The correct implementation 
> uses
> a Counter metric).
>
> I would be in favor of replacing the existing Gauge.set(long) API with
> the String version and removing the old one. This would be a breaking
> change. However this is a relatively new API and is still marked
> @Experimental. Keeping the old API would retain the potential confusion.
> It's better to simplify the API surface: having two APIs makes it less
> clear which one users should choose.
>
> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada 
> wrote:
>
>> Hello all,
>> As I was working on adding support for Gauges in Dataflow, some noted
>> that Gauge is a fairly unusual kind of metric for a distributed
>> environment, since many workers will report different values and stomp on
>> each other's all the time.
>>
>> We also looked at Flink and Dropwizard Gauge metrics [1][2], and we
>> found that these use generics, and Flink explicitly mentions that a
>> toString implementation is required[3].
>>
>> With that in mind, I'm thinking that it might make sense to 1) expand
>> Gauge to support string values (keep int-based API for backwards
>> compatibility), and migrate it to use string behind the covers.
>>
>> What does everyone think about this?
>>
>> Best
>> -P.
>>
>> 1 -
>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
>> 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
>> 3 -
>> https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
>> JIRA issue for Gauge metrics -
>> https://issues.apache.org/jira/browse/BEAM-1616
>> --
>> Got feedback? go/pabloem-feedback
>> 
>>
> --
>
>
> Got feedback? http://go/swegner-feedback
>
 --
>> Got feedback? go/pabloem-feedback
>> 

Re: About the Gauge metric API

2018-04-06 Thread Raghu Angadi
>
> I would be in favor of replacing the existing Gauge.set(long) API with the
> String version and removing the old one. This would be a breaking change.
> However this is a relatively new API and is still marked @Experimental.
> Keeping the old API would retain the potential confusion. It's better to
> simplify the API surface: having two APIs makes it less clear which one
> users should choose.


Supporting additional data types sounds good. But the above states string
API will replace the existing API. I do not see how string API makes the
semantics more clear.  Semantically both are same to the user.

On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada  wrote:

> Hi Ben : D
>
> Sure, that's reasonable. And perhaps I started the discussion in the wrong
> direction. I'm not questioning the utility of Gauge metrics.
>
> What I'm saying is that Beam only supports integers,, but Gauges are
> aggregated by dropping old values depending on their update times; so it
> might be desirable to not restrict the data type to just integers.
>
> -P.
>
> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers  wrote:
>
>> See for instance how gauge metrics are handled in Prometheus, Datadog and
>> Stackdriver monitoring. Gauges are perfect for use in distributed systems,
>> they just need to be properly labeled. Perhaps we should apply a default
>> tag or allow users to specify one.
>>
>> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers  wrote:
>>
>>> Some metrics backend label the value, for instance with the worker that
>>> sent it. Then the aggregation is latest per label. This makes it useful for
>>> holding values such as "memory usage" that need to hold current value.
>>>
>>> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner  wrote:
>>>
 +1 on the proposal to support a "String" gauge.

 To expand a bit, the current API doesn't make it clear that the gauge
 value is based on local state. If a runner chooses to parallelize a DoFn
 across many workers, each worker will have its own local Gauge metric and
 its updates will overwrite other values. For example, from the API it looks
 like you could use a gauge to implement your own element count metric:

 long count = 0;
 @ProcessElement
 public void processElement(ProcessContext c) {
   myGauge.set(++count);
   c.output(c.element());
 }

 This looks correct, but each worker has their own local 'count' field,
 and gauge metric updates from parallel workers will overwrite each other
 rather than get aggregated. So the final value would be "the number of
 elements processed on one of the workers". (The correct implementation uses
 a Counter metric).

 I would be in favor of replacing the existing Gauge.set(long) API with
 the String version and removing the old one. This would be a breaking
 change. However this is a relatively new API and is still marked
 @Experimental. Keeping the old API would retain the potential confusion.
 It's better to simplify the API surface: having two APIs makes it less
 clear which one users should choose.

 On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada 
 wrote:

> Hello all,
> As I was working on adding support for Gauges in Dataflow, some noted
> that Gauge is a fairly unusual kind of metric for a distributed
> environment, since many workers will report different values and stomp on
> each other's all the time.
>
> We also looked at Flink and Dropwizard Gauge metrics [1][2], and we
> found that these use generics, and Flink explicitly mentions that a
> toString implementation is required[3].
>
> With that in mind, I'm thinking that it might make sense to 1) expand
> Gauge to support string values (keep int-based API for backwards
> compatibility), and migrate it to use string behind the covers.
>
> What does everyone think about this?
>
> Best
> -P.
>
> 1 -
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
> 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
> 3 -
> https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
> JIRA issue for Gauge metrics -
> https://issues.apache.org/jira/browse/BEAM-1616
> --
> Got feedback? go/pabloem-feedback
> 
>
 --


 Got feedback? http://go/swegner-feedback

>>> --
> Got feedback? go/pabloem-feedback
> 
>


Re: Gradle Status [April 6]

2018-04-06 Thread Romain Manni-Bucau
Hi Scott,

is it right that 2 doesn't handle the hierachy anymore and that it doesn't
handle profiles for runners as it is currently with maven?


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-04-06 18:32 GMT+02:00 Scott Wegner :

> I wanted to start a thread to summarize the current state of Gradle
> migration. We've made lots of good progress so far this week. Here's the
> status from what I can tell-- please add or correct anything I missed:
>
> * Release artifacts can be built and published for Snapshot and officlal
> releases [1]
> * Gradle-generated releases have been validated with the the Apache Beam
> archetype generation quickstart; still needs additional validation.
> * Generated release pom files have correct project metadata [2]
> * The python pre-commits are now working in Gradle [3]
> * Ismaël has started a collaborative doc of Gradle tips [4] as we all
> learn the new system-- please add your own. This will eventually feed into
> official documentation on the website.
> * Łukasz Gajowy is working on migrating performance testing framework [5]
> * Daniel is working on updating documentation to refer to Gradle instead
> of maven
>
> If I missed anything, please add it to this thread.
>
> The general roadmap we're working towards is:
> (a) Publish release artifacts with Gradle (SNAPSHOT and signed releases)
> (b) Postcommits migrated to Gradle
> (c) Migrate documentation from maven to Gradle
> (d) Migrate perfkit suites to use Gradle
>
> For those of you that are hacking: thanks for your help so far! Progress
> is being roughly tracked on the Kanban [6]; please make sure the issues
> assigned to you are up-to-date. Many of the changes are staged on
> lukecwik's local branch [7]; we'll work on merging them back soon.
>
>
> [1] https://github.com/lukecwik/incubator-beam/pull/7
> [2] https://github.com/lukecwik/incubator-beam/pull/3
> [3] https://github.com/apache/beam/pull/5032
> [4] https://docs.google.com/document/d/1wR56Jef3XIPwj4DFzQKznuGPM3JDf
> RDVkxzeDlbdVSQ/edit
> [5] https://github.com/apache/beam/pull/5003
> [6] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242
> [7] https://github.com/lukecwik/incubator-beam/tree/gradle
> --
>
>
> Got feedback? http://go/swegner-feedback
>


Gradle Status [April 6]

2018-04-06 Thread Scott Wegner
I wanted to start a thread to summarize the current state of Gradle
migration. We've made lots of good progress so far this week. Here's the
status from what I can tell-- please add or correct anything I missed:

* Release artifacts can be built and published for Snapshot and officlal
releases [1]
* Gradle-generated releases have been validated with the the Apache Beam
archetype generation quickstart; still needs additional validation.
* Generated release pom files have correct project metadata [2]
* The python pre-commits are now working in Gradle [3]
* Ismaël has started a collaborative doc of Gradle tips [4] as we all learn
the new system-- please add your own. This will eventually feed into
official documentation on the website.
* Łukasz Gajowy is working on migrating performance testing framework [5]
* Daniel is working on updating documentation to refer to Gradle instead of
maven

If I missed anything, please add it to this thread.

The general roadmap we're working towards is:
(a) Publish release artifacts with Gradle (SNAPSHOT and signed releases)
(b) Postcommits migrated to Gradle
(c) Migrate documentation from maven to Gradle
(d) Migrate perfkit suites to use Gradle

For those of you that are hacking: thanks for your help so far! Progress is
being roughly tracked on the Kanban [6]; please make sure the issues
assigned to you are up-to-date. Many of the changes are staged on
lukecwik's local branch [7]; we'll work on merging them back soon.


[1] https://github.com/lukecwik/incubator-beam/pull/7
[2] https://github.com/lukecwik/incubator-beam/pull/3
[3] https://github.com/apache/beam/pull/5032
[4]
https://docs.google.com/document/d/1wR56Jef3XIPwj4DFzQKznuGPM3JDfRDVkxzeDlbdVSQ/edit
[5] https://github.com/apache/beam/pull/5003
[6] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242
[7] https://github.com/lukecwik/incubator-beam/tree/gradle
-- 


Got feedback? http://go/swegner-feedback


Re: About the Gauge metric API

2018-04-06 Thread Pablo Estrada
Hi Ben : D

Sure, that's reasonable. And perhaps I started the discussion in the wrong
direction. I'm not questioning the utility of Gauge metrics.

What I'm saying is that Beam only supports integers,, but Gauges are
aggregated by dropping old values depending on their update times; so it
might be desirable to not restrict the data type to just integers.

-P.

On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers  wrote:

> See for instance how gauge metrics are handled in Prometheus, Datadog and
> Stackdriver monitoring. Gauges are perfect for use in distributed systems,
> they just need to be properly labeled. Perhaps we should apply a default
> tag or allow users to specify one.
>
> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers  wrote:
>
>> Some metrics backend label the value, for instance with the worker that
>> sent it. Then the aggregation is latest per label. This makes it useful for
>> holding values such as "memory usage" that need to hold current value.
>>
>> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner  wrote:
>>
>>> +1 on the proposal to support a "String" gauge.
>>>
>>> To expand a bit, the current API doesn't make it clear that the gauge
>>> value is based on local state. If a runner chooses to parallelize a DoFn
>>> across many workers, each worker will have its own local Gauge metric and
>>> its updates will overwrite other values. For example, from the API it looks
>>> like you could use a gauge to implement your own element count metric:
>>>
>>> long count = 0;
>>> @ProcessElement
>>> public void processElement(ProcessContext c) {
>>>   myGauge.set(++count);
>>>   c.output(c.element());
>>> }
>>>
>>> This looks correct, but each worker has their own local 'count' field,
>>> and gauge metric updates from parallel workers will overwrite each other
>>> rather than get aggregated. So the final value would be "the number of
>>> elements processed on one of the workers". (The correct implementation uses
>>> a Counter metric).
>>>
>>> I would be in favor of replacing the existing Gauge.set(long) API with
>>> the String version and removing the old one. This would be a breaking
>>> change. However this is a relatively new API and is still marked
>>> @Experimental. Keeping the old API would retain the potential confusion.
>>> It's better to simplify the API surface: having two APIs makes it less
>>> clear which one users should choose.
>>>
>>> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada  wrote:
>>>
 Hello all,
 As I was working on adding support for Gauges in Dataflow, some noted
 that Gauge is a fairly unusual kind of metric for a distributed
 environment, since many workers will report different values and stomp on
 each other's all the time.

 We also looked at Flink and Dropwizard Gauge metrics [1][2], and we
 found that these use generics, and Flink explicitly mentions that a
 toString implementation is required[3].

 With that in mind, I'm thinking that it might make sense to 1) expand
 Gauge to support string values (keep int-based API for backwards
 compatibility), and migrate it to use string behind the covers.

 What does everyone think about this?

 Best
 -P.

 1 -
 https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
 3 -
 https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
 JIRA issue for Gauge metrics -
 https://issues.apache.org/jira/browse/BEAM-1616
 --
 Got feedback? go/pabloem-feedback
 

>>> --
>>>
>>>
>>> Got feedback? http://go/swegner-feedback
>>>
>> --
Got feedback? go/pabloem-feedback


Re: About the Gauge metric API

2018-04-06 Thread Ben Chambers
See for instance how gauge metrics are handled in Prometheus, Datadog and
Stackdriver monitoring. Gauges are perfect for use in distributed systems,
they just need to be properly labeled. Perhaps we should apply a default
tag or allow users to specify one.

On Fri, Apr 6, 2018, 9:14 AM Ben Chambers  wrote:

> Some metrics backend label the value, for instance with the worker that
> sent it. Then the aggregation is latest per label. This makes it useful for
> holding values such as "memory usage" that need to hold current value.
>
> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner  wrote:
>
>> +1 on the proposal to support a "String" gauge.
>>
>> To expand a bit, the current API doesn't make it clear that the gauge
>> value is based on local state. If a runner chooses to parallelize a DoFn
>> across many workers, each worker will have its own local Gauge metric and
>> its updates will overwrite other values. For example, from the API it looks
>> like you could use a gauge to implement your own element count metric:
>>
>> long count = 0;
>> @ProcessElement
>> public void processElement(ProcessContext c) {
>>   myGauge.set(++count);
>>   c.output(c.element());
>> }
>>
>> This looks correct, but each worker has their own local 'count' field,
>> and gauge metric updates from parallel workers will overwrite each other
>> rather than get aggregated. So the final value would be "the number of
>> elements processed on one of the workers". (The correct implementation uses
>> a Counter metric).
>>
>> I would be in favor of replacing the existing Gauge.set(long) API with
>> the String version and removing the old one. This would be a breaking
>> change. However this is a relatively new API and is still marked
>> @Experimental. Keeping the old API would retain the potential confusion.
>> It's better to simplify the API surface: having two APIs makes it less
>> clear which one users should choose.
>>
>> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada  wrote:
>>
>>> Hello all,
>>> As I was working on adding support for Gauges in Dataflow, some noted
>>> that Gauge is a fairly unusual kind of metric for a distributed
>>> environment, since many workers will report different values and stomp on
>>> each other's all the time.
>>>
>>> We also looked at Flink and Dropwizard Gauge metrics [1][2], and we
>>> found that these use generics, and Flink explicitly mentions that a
>>> toString implementation is required[3].
>>>
>>> With that in mind, I'm thinking that it might make sense to 1) expand
>>> Gauge to support string values (keep int-based API for backwards
>>> compatibility), and migrate it to use string behind the covers.
>>>
>>> What does everyone think about this?
>>>
>>> Best
>>> -P.
>>>
>>> 1 -
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
>>> 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
>>> 3 -
>>> https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
>>> JIRA issue for Gauge metrics -
>>> https://issues.apache.org/jira/browse/BEAM-1616
>>> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>> --
>>
>>
>> Got feedback? http://go/swegner-feedback
>>
>


Re: About the Gauge metric API

2018-04-06 Thread Ben Chambers
Some metrics backend label the value, for instance with the worker that
sent it. Then the aggregation is latest per label. This makes it useful for
holding values such as "memory usage" that need to hold current value.

On Fri, Apr 6, 2018, 9:00 AM Scott Wegner  wrote:

> +1 on the proposal to support a "String" gauge.
>
> To expand a bit, the current API doesn't make it clear that the gauge
> value is based on local state. If a runner chooses to parallelize a DoFn
> across many workers, each worker will have its own local Gauge metric and
> its updates will overwrite other values. For example, from the API it looks
> like you could use a gauge to implement your own element count metric:
>
> long count = 0;
> @ProcessElement
> public void processElement(ProcessContext c) {
>   myGauge.set(++count);
>   c.output(c.element());
> }
>
> This looks correct, but each worker has their own local 'count' field, and
> gauge metric updates from parallel workers will overwrite each other rather
> than get aggregated. So the final value would be "the number of elements
> processed on one of the workers". (The correct implementation uses a
> Counter metric).
>
> I would be in favor of replacing the existing Gauge.set(long) API with the
> String version and removing the old one. This would be a breaking change.
> However this is a relatively new API and is still marked @Experimental.
> Keeping the old API would retain the potential confusion. It's better to
> simplify the API surface: having two APIs makes it less clear which one
> users should choose.
>
> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada  wrote:
>
>> Hello all,
>> As I was working on adding support for Gauges in Dataflow, some noted
>> that Gauge is a fairly unusual kind of metric for a distributed
>> environment, since many workers will report different values and stomp on
>> each other's all the time.
>>
>> We also looked at Flink and Dropwizard Gauge metrics [1][2], and we found
>> that these use generics, and Flink explicitly mentions that a toString
>> implementation is required[3].
>>
>> With that in mind, I'm thinking that it might make sense to 1) expand
>> Gauge to support string values (keep int-based API for backwards
>> compatibility), and migrate it to use string behind the covers.
>>
>> What does everyone think about this?
>>
>> Best
>> -P.
>>
>> 1 -
>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
>> 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
>> 3 -
>> https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
>> JIRA issue for Gauge metrics -
>> https://issues.apache.org/jira/browse/BEAM-1616
>> --
>> Got feedback? go/pabloem-feedback
>> 
>>
> --
>
>
> Got feedback? http://go/swegner-feedback
>


Re: Gradle questions on Eclipse and End to End tests

2018-04-06 Thread Scott Wegner
Gradle has plugins for creating the necessary project files for various
IDEs. I haven't tried it yet, but I would recommend starting there:

* https://docs.gradle.org/current/userguide/eclipse_plugin.html
* https://docs.gradle.org/current/userguide/idea_plugin.html

On Fri, Apr 6, 2018 at 8:15 AM Łukasz Gajowy 
wrote:

>
>> 2. I'm having trouble finding relevant information for this section: E2E
>> Testing Framework
>> .
>> Does anyone know what the progress is on E2E tests in Gradle?
>>
>>
> This section seems to relate to both things like WordCountIT
> 
>  and
> the IO integration tests that we're currently developing. The latter are
> documented in more detail in the testing docs
> .
> It is a little bit outdated - I wanted to tackle this after the gradle
> migration. Finally, there is a PR on it's way to run IOITs using gradle:
> 5003 . It provides an
> "integrationTest" task to replace the mentioned failsafe plugin which is
> used in maven.
>
> Best regards,
>
> Łukasz
>
-- 


Got feedback? http://go/swegner-feedback


Re: About the Gauge metric API

2018-04-06 Thread Scott Wegner
+1 on the proposal to support a "String" gauge.

To expand a bit, the current API doesn't make it clear that the gauge value
is based on local state. If a runner chooses to parallelize a DoFn across
many workers, each worker will have its own local Gauge metric and its
updates will overwrite other values. For example, from the API it looks
like you could use a gauge to implement your own element count metric:

long count = 0;
@ProcessElement
public void processElement(ProcessContext c) {
  myGauge.set(++count);
  c.output(c.element());
}

This looks correct, but each worker has their own local 'count' field, and
gauge metric updates from parallel workers will overwrite each other rather
than get aggregated. So the final value would be "the number of elements
processed on one of the workers". (The correct implementation uses a
Counter metric).

I would be in favor of replacing the existing Gauge.set(long) API with the
String version and removing the old one. This would be a breaking change.
However this is a relatively new API and is still marked @Experimental.
Keeping the old API would retain the potential confusion. It's better to
simplify the API surface: having two APIs makes it less clear which one
users should choose.

On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada  wrote:

> Hello all,
> As I was working on adding support for Gauges in Dataflow, some noted that
> Gauge is a fairly unusual kind of metric for a distributed environment,
> since many workers will report different values and stomp on each other's
> all the time.
>
> We also looked at Flink and Dropwizard Gauge metrics [1][2], and we found
> that these use generics, and Flink explicitly mentions that a toString
> implementation is required[3].
>
> With that in mind, I'm thinking that it might make sense to 1) expand
> Gauge to support string values (keep int-based API for backwards
> compatibility), and migrate it to use string behind the covers.
>
> What does everyone think about this?
>
> Best
> -P.
>
> 1 -
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
> 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
> 3 -
> https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
> JIRA issue for Gauge metrics -
> https://issues.apache.org/jira/browse/BEAM-1616
> --
> Got feedback? go/pabloem-feedback
> 
>
-- 


Got feedback? http://go/swegner-feedback


About the Gauge metric API

2018-04-06 Thread Pablo Estrada
Hello all,
As I was working on adding support for Gauges in Dataflow, some noted that
Gauge is a fairly unusual kind of metric for a distributed environment,
since many workers will report different values and stomp on each other's
all the time.

We also looked at Flink and Dropwizard Gauge metrics [1][2], and we found
that these use generics, and Flink explicitly mentions that a toString
implementation is required[3].

With that in mind, I'm thinking that it might make sense to 1) expand Gauge
to support string values (keep int-based API for backwards compatibility),
and migrate it to use string behind the covers.

What does everyone think about this?

Best
-P.

1 -
https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
3 -
https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
JIRA issue for Gauge metrics -
https://issues.apache.org/jira/browse/BEAM-1616
-- 
Got feedback? go/pabloem-feedback


building on top of filesystem, can beam help?

2018-04-06 Thread Romain Manni-Bucau
Hi guys,

I have a use case where I'd like to be able to expose to a user some file
system navigation and enable him to visualize the file system (as in beam
sense)

Technically it is a matter of being able to use glob pattern to browse the
file system using match(specs).

What is important in that use case is to align the visualization and the
potential runtime to have the same impl/view and not have to split it in 2
code branches which can lead to inconsistency.

Therefore i'd like to be able to reuse beam FileSystem but I have a few
blockers:

1. it is nested in sdk-java-core which brings 2 drawbacks
a. it brings the whole beam sdk which is not desired in that part of the
app (should not be visible in the classpath)
b. the dependency stack is just unpractiable (guava, jackson, byte-buddy,
avro, joda, at least, are not desired at all here) and a shade makes it way
too fat to be a valid dependency for that usage
2. I don't know how to configure the FS from one of its instance (I'd like
to be able to get its options class like FileSystem#getConfigurationType
returning a PipelineOptions)

Do you think it is possible to extract the filesystem API in a dependency
free beam subproject (or at least submodule) and add the configuration hint
in the API?

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Re: Gradle questions on Eclipse and End to End tests

2018-04-06 Thread Łukasz Gajowy
>
>
> 2. I'm having trouble finding relevant information for this section: E2E
> Testing Framework
> .
> Does anyone know what the progress is on E2E tests in Gradle?
>
>
This section seems to relate to both things like WordCountIT

and
the IO integration tests that we're currently developing. The latter are
documented in more detail in the testing docs
.
It is a little bit outdated - I wanted to tackle this after the gradle
migration. Finally, there is a PR on it's way to run IOITs using gradle:
5003 . It provides an
"integrationTest" task to replace the mentioned failsafe plugin which is
used in maven.

Best regards,
Łukasz


Re: Gradle questions on Eclipse and End to End tests

2018-04-06 Thread Daniel Kulp


> On Apr 5, 2018, at 5:31 PM, Daniel Oliveira  wrote:
> 
> So I'm working on updating the Beam Contributor's Guide to swap references to 
> Maven with Gradle. I'm wondering if anyone can help with two trouble spots 
> I'm having:
> 
> 1. Has anyone set up Eclipse to work with Gradle for beam development? If so 
> can you give me a description of how that's done for this page? Beam Eclipse 
> Tips 

I started looking at this a while ago, but kind of gave up.  The Eclipse 
compiler cannot even compile Beam due to the wacky use of some of the generics 
in a few places.   Never really had time to figure out if there was anything 
that could be done about that. 



> 
> 2. I'm having trouble finding relevant information for this section: E2E 
> Testing Framework.
> Does anyone know what the progress is on E2E tests in Gradle?
> 
> Thanks
> Daniel Oliveira

-- 
Daniel Kulp
dk...@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com



Re: [PROPOSAL] Python 3 support

2018-04-06 Thread Robbe Sneyders
Hi all,

I don't seem to have the permissions to create a Kanban board or even
assign tasks to myself. Who could help me with this?

I've updated the coders package pull request [1] and added the applied
strategy to the proposal document [2].
It would be great to get some feedback on this, so we can start moving
forward with other subpackages.

Kind regards,
Robbe

[1] https://github.com/apache/beam/pull/4990
[2]
https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing


On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders  wrote:

> Hello Robert,
>
> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
> this. I'll look into setting one up tomorrow.
>
> In the meantime, you can find the first pull request with the updated
> coders package here:
> https://github.com/apache/beam/pull/4990
>
> Kind regards,
> Robbe
>
> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw  wrote:
>
>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
>> wrote:
>>
>>> Thanks Ahmet and Robert,
>>>
>>> I think we can work on different subpackages in parallel, but it's
>>> important to apply the same strategy everywhere. I'm currently working on
>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>> coders subpackage to create a first pull request. We can then discuss the
>>> applied strategy in detail before merging and applying it to the other
>>> subpackages.
>>>
>>
>> Sounds good. Again, could you document (in a more permanent/easy to look
>> up state than email) when packages are started/done?
>>
>>
>>> This strategy also includes the choice of automated tools. I'm focusing
>>> on writing python 3 code with python 2 compatibility, which means depending
>>> on the future package instead of the six package (which is already used in
>>> some places in the current code base). I have already noticed that this
>>> indeed requires a lot of manual work after running the automated script.
>>> The future package supports python 3.3+ compatibility, so I don't think
>>> there is a higher cost supporting 3.4 compared to 3.5+.
>>>
>>
>> Sure. It may incur a higher maintenance burden long-term though.
>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>> some time to come.)
>>
>>
>>> I have already added a tox environment to run pylint2 with the --py3k
>>> argument per updated subpackage, which should help avoid regression between
>>> step 2 and step 3 of the proposal. This update will be pushed with the
>>> first pull request.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>>
>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw 
>>> wrote:
>>>
 Thank you, Robbie, for your offer to help with contribution here. I
 read over your doc and the one thing I'd like to add is that this work is
 very parallelizable, but if we have enough people looking at it we'll want
 some way to coordinate so as to not overlap work (or just waste time
 discovering what's been done). Tracking individual JIRAs and PRs gets
 unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
 various automated/manual conversions along the other would be helpful?

 A note on automated tools, they're sometimes overly conservative, so we
 should be sure to review the changes manually. (A typical example of this
 is unnecessarily importing six.moves.xrange when there was no big reason to
 use xrange over range in Python 2, or conversely using list(range(...) in
 Python 3.)

 Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
 If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
 identify it and decide that before widespread announcement.

 On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:

>
>
> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
> wrote:
>
>>
>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
>> wrote:
>>
>>> Hi Anand,
>>>
>>> Thanks for the feedback.
>>>
>>> It should be no problem to run everything on DataflowRunner as well.
>>> Are there any performance tests in place to check for performance
>>> regressions?
>>>
>>
> Yes there is a suite (
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
> It may not be very comprehensive and seems to be failing for a while. I
> would not block python 3 work on performance for now. That is the
> unfortuante state of things.
>
> If anybody in the community is interested, this would be a great
> opportunity to help with benchmarks in general.
>
>
>>
>>> Some questions were raised in the proposal document which I want to
>>> add to this conversation:
>>>
>>> The first comment was about the targeted python 3 versions. We
>>> proposed to target 3.6 since it is the latest version available and 
>>> added
>>> 3.5 because 3.6 a

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-04-06 Thread Romain Manni-Bucau
+1 to get 2.5 out asap (fixes blockers so always good to let it be upgraded)
+1000 to split beam releases (and even repos) by concerns


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-04-06 10:48 GMT+02:00 Jean-Baptiste Onofré :

> Hi guys,
>
> Apache Beam 2.4.0 has been released on March 20th.
>
> According to our cycle of release (roughly 6 weeks), we should think about
> 2.5.0.
>
> I'm volunteer to tackle this release.
>
> I'm proposing the following items:
>
> 1. We start the Jira triage now, up to Tuesday
> 2. I would like to cut the release on Tuesday night (Europe time)
> 2bis. I think it's wiser to still use Maven for this release. Do you think
> we
> will be ready to try a release with Gradle ?
>
> After this release, I would like a discussion about:
> 1. Gradle release (if we release 2.5.0 with Maven)
> 2. Isolate release cycle per Beam part. I think it would be interesting to
> have
> different release cycle: SDKs, DSLs, Runners, IOs. That's another
> discussion, I
> will start a thread about that.
>
> Thoughts ?
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


[PROPOSAL] Preparing 2.5.0 release next week

2018-04-06 Thread Jean-Baptiste Onofré
Hi guys,

Apache Beam 2.4.0 has been released on March 20th.

According to our cycle of release (roughly 6 weeks), we should think about 
2.5.0.

I'm volunteer to tackle this release.

I'm proposing the following items:

1. We start the Jira triage now, up to Tuesday
2. I would like to cut the release on Tuesday night (Europe time)
2bis. I think it's wiser to still use Maven for this release. Do you think we
will be ready to try a release with Gradle ?

After this release, I would like a discussion about:
1. Gradle release (if we release 2.5.0 with Maven)
2. Isolate release cycle per Beam part. I think it would be interesting to have
different release cycle: SDKs, DSLs, Runners, IOs. That's another discussion, I
will start a thread about that.

Thoughts ?

Regards
JB
-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Gradle Tips and tricks

2018-04-06 Thread Jean-Baptiste Onofré
By the way, I'm doing a new pass on the current state of the gradle build.

Regards
JB

On 04/06/2018 09:03 AM, Ismaël Mejía wrote:
> After some discussion on slack it is clear that we need to document
> some of the gradle replacements of our common maven commands. We
> started a shared doc yesterday to share some of those and other gradle
> tips and tricks. I invite everyone who can help to add their favorite
> gradle 'incantations' there and other related knowledge. We will
> migrate this info to the website afterwards.
> 
> https://s.apache.org/beam-gradle-tips-edit
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Gradle Tips and tricks

2018-04-06 Thread Jean-Baptiste Onofré
Agree, it's what I mentioned this morning on Slack.

I think it would be great to have a summary of the current state on the dev
mailing list.

At the end of the day, the contribution guide should be updated (we have a Jira
about that afair).

Regards
JB

On 04/06/2018 09:03 AM, Ismaël Mejía wrote:
> After some discussion on slack it is clear that we need to document
> some of the gradle replacements of our common maven commands. We
> started a shared doc yesterday to share some of those and other gradle
> tips and tricks. I invite everyone who can help to add their favorite
> gradle 'incantations' there and other related knowledge. We will
> migrate this info to the website afterwards.
> 
> https://s.apache.org/beam-gradle-tips-edit
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Gradle Tips and tricks

2018-04-06 Thread Ismaël Mejía
After some discussion on slack it is clear that we need to document
some of the gradle replacements of our common maven commands. We
started a shared doc yesterday to share some of those and other gradle
tips and tricks. I invite everyone who can help to add their favorite
gradle 'incantations' there and other related knowledge. We will
migrate this info to the website afterwards.

https://s.apache.org/beam-gradle-tips-edit