Apache beam DataFlow runner throwing setup error

2018-03-22 Thread Rajesh Hegde
Hi,
We are building data pipeline using Beam Python SDK and trying to run on
Dataflow, but getting the below error,

*A setup error was detected in
beamapp--0322102737-03220329-8a74-harness-lm6v. Please refer to the
worker-startup log for detailed information.*

But could not find detailed worker-startup logs.

We tried increasing memory size, worker count etc, but still getting the
same error.

Here is the command we use,
*python run.py \*
*--project=xyz \*
*--runner=DataflowRunner \*
*--staging_location=gs://xyz/staging \*
*--temp_location=gs://xyz/temp \*
*--requirements_file=requirements.txt \*
*--worker_machine_type n1-standard-8 \*
*--num_workers 2*


pipeline snippet

*data = pipeline | "load data" >> beam.io.Read(*
*beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")*
*)*

*data | "filter data" >> beam.Filter(lambda x: x.get('column_name') ==
value)*


Above pipeline is just loading the data from BigQuery and filtering based
on some column value. This pipeline works like a charm in DirectRunner but
fails on Dataflow.

Are we doing any obvious setup mistake? anyone else getting the same error?
We could use some help to resolve the issue.


-- 

*Rajesh Hegde | Lead Product Developer | Datalicious*
*e*: rhe...@datalicious.com | *m*: +919167571827
*a*: L-77, 15th Cross Rd, Sector 6, HSR Layout,
Bangalore Karnataka- 560102
*w*: www.datalicious.com


*Contact supp...@datalicious.com  anytime, we're
keen to help!*


   





Re: Jenkins wait times

2018-03-22 Thread Udi Meiri
https://github.com/apache/beam/pull/4942 for anyone following.

On Thu, Mar 22, 2018 at 10:45 AM Lukasz Cwik  wrote:

> The maximum passing runtime was 70 mins in Gradle, so it seems reasonable
> to set it to 90 mins.
> For Maven, its about 130 mins so setting it to 150 mins is also reasonable.
>
> It turns out that ~1/8 in builds were aborted due to taking too long in
> both Maven / Java due to tests getting stuck.
> Would be worthwhile to set a maximum runtime per test to prevent this.
>
> Feel free to open a PR and send it my way.
>
>
> On Thu, Mar 22, 2018 at 10:14 AM Udi Meiri  wrote:
>
>> Hi,
>> I've been seeing increased wait times on Jenkins. It's frustrating to
>> wait 8h
>> 
>> for a build, or 4h
>> 
>> for it just to schedule.
>>
>> Data point:
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/buildTimeTrend
>> These builds seem to be timing out after 4 hours. Can we reduce the
>> timeout to just 1 hour?
>>
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Gradle status

2018-03-22 Thread Reuven Lax
Let's back up for a second.

Earlier in the thread we agreed to organize a community "fixit" day to try
and migrate remaining Maven items to Gradle. I had thought that Romain had
volunteered to run this, but reading back in the thread it appears that I
misunderstood this. I would suggest that we organize this first, and make
the concerted push to migrate the remaining items. After this is done, we
can evaluate the state we're left in and hold a process vote if necessary.

I can volunteer to help coordinate this fixit.

Reuven

On Thu, Mar 22, 2018 at 1:35 PM Dan Halperin  wrote:

> On Thu, Mar 22, 2018 at 11:19 AM, Chamikara Jayalath  > wrote:
>
>> I don't think incremental progress is a bad thing as long as we are
>> making progress towards the goal. Do we need better metrics (a weekly email
>> ?) about the progress towards moving everything to Gradle ? I agree with
>> others who pointed out that there are many unresolved JIRAs and simply
>> deleting Maven artifacts could break many things (for example, performance
>> tests).
>>
>
> The problem does not seem to be incremental progress on its face, or a
> lack of metrics.
>
> The problem is that there are two build systems with separate features and
> issues, doubling (or worse) Jenkins cycles, mental effort, maintenance
> burden, etc. It hurts the community and casual contributors.
>
> As Luke suggested,
> > A process vote can be happen if the in-between state is too painful to
> maintain.
>
> Given that the in-between state has lasted so long, and there is it may be
> time.
>
> Dan
>
>
>
>>
>> Thanks,
>> Cham
>>
>>
>> On Thu, Mar 22, 2018 at 10:56 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>>
>>>
>>> Le 22 mars 2018 18:49, "Dan Halperin"  a écrit :
>>>
>>> It seems that a few groups are talking past each other.
>>>
>>> * A sizable contingent is interested in a move to Gradle -- it shows
>>> promise, but the work is incomplete.
>>> * Another contingent noticing the large burden of maintaining multiple
>>> build systems. FWICT, both test suites have been broken quite a lot
>>> recently, mainly the gradle ones, which is a cost to the community. This is
>>> creating a barrier to entry for new contributors – especially those who
>>> don't work in the same room or do their primary development in a different
>>> repository.
>>>
>>> I don't see this situation being resolved to anyone's satisfaction until
>>> there's only one build system left. The onus is clearly on the Gradle
>>> promoters to finish the work.
>>>
>>> Luke made a suggestion 2.5 months ago that we should have a process vote
>>> if this situation is untenable. It seems like we're there.
>>>
>>>
>>> Yes but beam voted to move to gradle so we should but we shouldnt
>>> maintain 2 build systems for more than 2 months (weway overpassed that) and
>>> therefore the vote should be cancelled or validated by an action.
>>>
>>> I understand you want gradle but you dont want to pay the cost to move
>>> to gradle, it doesnt work for the project do please another option
>>> (rollbacking gradle or removing maven, all other options are negative for
>>> the project and a pain for committers and contributors whatever you think).
>>>
>>>
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Thu, Mar 22, 2018 at 10:30 AM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Ok so to be clear for any contributor (which is the goal of this
 thread): maven is still the main build system and no need to maintain
 gradle in PR then until beam switches.

 Im more than fine with that.

 Le 22 mars 2018 18:22, "Alan Myrvold"  a écrit :

> I think the investment in gradle is worthwhile, and incrementally we
> will continue to make progress. From what I've seem, gradle is a good fit
> for this project and a path to a faster, more reliable build system.
>
> pull/4812  creates the
> release artifacts, although it is not hooked up yet with authentication.
>
> I expect to help make incremental progress over the next month
> converting some of the integration tests, but welcome incremental
> improvements from others.
>
>
>
> On Thu, Mar 22, 2018 at 9:57 AM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>>
>>
>> 2018-03-22 17:45 GMT+01:00 Lukasz Cwik :
>>
>>> what do we do? "Gradle migration will happen incrementally."
>>>
>>> "last months prooved beam cant maintain 2 systems, easier with that
>>> state is then to drop gradle since it is a 0 investment compared to the
>>> opposite"
>>> Its unfortunate that you feel this way but many people do not share
>>> your opinion.
>>>
>>
>> And a lot do so when a project is 50-50 it is time to act.
>>
>> Incrementally kind of means never (makes 4 months and nothing really
>> 

Re: Gradle status

2018-03-22 Thread Dan Halperin
On Thu, Mar 22, 2018 at 11:19 AM, Chamikara Jayalath 
wrote:

> I don't think incremental progress is a bad thing as long as we are making
> progress towards the goal. Do we need better metrics (a weekly email ?)
> about the progress towards moving everything to Gradle ? I agree with
> others who pointed out that there are many unresolved JIRAs and simply
> deleting Maven artifacts could break many things (for example, performance
> tests).
>

The problem does not seem to be incremental progress on its face, or a lack
of metrics.

The problem is that there are two build systems with separate features and
issues, doubling (or worse) Jenkins cycles, mental effort, maintenance
burden, etc. It hurts the community and casual contributors.

As Luke suggested,
> A process vote can be happen if the in-between state is too painful to
maintain.

Given that the in-between state has lasted so long, and there is it may be
time.

Dan



>
> Thanks,
> Cham
>
>
> On Thu, Mar 22, 2018 at 10:56 AM Romain Manni-Bucau 
> wrote:
>
>>
>>
>> Le 22 mars 2018 18:49, "Dan Halperin"  a écrit :
>>
>> It seems that a few groups are talking past each other.
>>
>> * A sizable contingent is interested in a move to Gradle -- it shows
>> promise, but the work is incomplete.
>> * Another contingent noticing the large burden of maintaining multiple
>> build systems. FWICT, both test suites have been broken quite a lot
>> recently, mainly the gradle ones, which is a cost to the community. This is
>> creating a barrier to entry for new contributors – especially those who
>> don't work in the same room or do their primary development in a different
>> repository.
>>
>> I don't see this situation being resolved to anyone's satisfaction until
>> there's only one build system left. The onus is clearly on the Gradle
>> promoters to finish the work.
>>
>> Luke made a suggestion 2.5 months ago that we should have a process vote
>> if this situation is untenable. It seems like we're there.
>>
>>
>> Yes but beam voted to move to gradle so we should but we shouldnt
>> maintain 2 build systems for more than 2 months (weway overpassed that) and
>> therefore the vote should be cancelled or validated by an action.
>>
>> I understand you want gradle but you dont want to pay the cost to move to
>> gradle, it doesnt work for the project do please another option
>> (rollbacking gradle or removing maven, all other options are negative for
>> the project and a pain for committers and contributors whatever you think).
>>
>>
>>
>> Thanks,
>> Dan
>>
>> On Thu, Mar 22, 2018 at 10:30 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Ok so to be clear for any contributor (which is the goal of this
>>> thread): maven is still the main build system and no need to maintain
>>> gradle in PR then until beam switches.
>>>
>>> Im more than fine with that.
>>>
>>> Le 22 mars 2018 18:22, "Alan Myrvold"  a écrit :
>>>
 I think the investment in gradle is worthwhile, and incrementally we
 will continue to make progress. From what I've seem, gradle is a good fit
 for this project and a path to a faster, more reliable build system.

 pull/4812  creates the
 release artifacts, although it is not hooked up yet with authentication.

 I expect to help make incremental progress over the next month
 converting some of the integration tests, but welcome incremental
 improvements from others.



 On Thu, Mar 22, 2018 at 9:57 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

>
>
> 2018-03-22 17:45 GMT+01:00 Lukasz Cwik :
>
>> what do we do? "Gradle migration will happen incrementally."
>>
>> "last months prooved beam cant maintain 2 systems, easier with that
>> state is then to drop gradle since it is a 0 investment compared to the
>> opposite"
>> Its unfortunate that you feel this way but many people do not share
>> your opinion.
>>
>
> And a lot do so when a project is 50-50 it is time to act.
>
> Incrementally kind of means never (makes 4 months and nothing really
> changed in PRs and habits, gradle maintener(s) are still alone)
>
>
>>
>>
>> On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> @Valentyn: concretely any user can PR and be part of that process so
>>> anyone can do it wrong (me first)
>>> @Luskasz, Hennking: fine but what do we do? last months prooved beam
>>> cant maintain 2 systems, easier with that state is then to drop gradle
>>> since it is a 0 investment compared to the opposite
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | 

Re: Gradle status

2018-03-22 Thread Chamikara Jayalath
I don't think incremental progress is a bad thing as long as we are making
progress towards the goal. Do we need better metrics (a weekly email ?)
about the progress towards moving everything to Gradle ? I agree with
others who pointed out that there are many unresolved JIRAs and simply
deleting Maven artifacts could break many things (for example, performance
tests).

Thanks,
Cham

On Thu, Mar 22, 2018 at 10:56 AM Romain Manni-Bucau 
wrote:

>
>
> Le 22 mars 2018 18:49, "Dan Halperin"  a écrit :
>
> It seems that a few groups are talking past each other.
>
> * A sizable contingent is interested in a move to Gradle -- it shows
> promise, but the work is incomplete.
> * Another contingent noticing the large burden of maintaining multiple
> build systems. FWICT, both test suites have been broken quite a lot
> recently, mainly the gradle ones, which is a cost to the community. This is
> creating a barrier to entry for new contributors – especially those who
> don't work in the same room or do their primary development in a different
> repository.
>
> I don't see this situation being resolved to anyone's satisfaction until
> there's only one build system left. The onus is clearly on the Gradle
> promoters to finish the work.
>
> Luke made a suggestion 2.5 months ago that we should have a process vote
> if this situation is untenable. It seems like we're there.
>
>
> Yes but beam voted to move to gradle so we should but we shouldnt maintain
> 2 build systems for more than 2 months (weway overpassed that) and
> therefore the vote should be cancelled or validated by an action.
>
> I understand you want gradle but you dont want to pay the cost to move to
> gradle, it doesnt work for the project do please another option
> (rollbacking gradle or removing maven, all other options are negative for
> the project and a pain for committers and contributors whatever you think).
>
>
>
> Thanks,
> Dan
>
> On Thu, Mar 22, 2018 at 10:30 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> Ok so to be clear for any contributor (which is the goal of this thread):
>> maven is still the main build system and no need to maintain gradle in PR
>> then until beam switches.
>>
>> Im more than fine with that.
>>
>> Le 22 mars 2018 18:22, "Alan Myrvold"  a écrit :
>>
>>> I think the investment in gradle is worthwhile, and incrementally we
>>> will continue to make progress. From what I've seem, gradle is a good fit
>>> for this project and a path to a faster, more reliable build system.
>>>
>>> pull/4812  creates the
>>> release artifacts, although it is not hooked up yet with authentication.
>>>
>>> I expect to help make incremental progress over the next month
>>> converting some of the integration tests, but welcome incremental
>>> improvements from others.
>>>
>>>
>>>
>>> On Thu, Mar 22, 2018 at 9:57 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>


 2018-03-22 17:45 GMT+01:00 Lukasz Cwik :

> what do we do? "Gradle migration will happen incrementally."
>
> "last months prooved beam cant maintain 2 systems, easier with that
> state is then to drop gradle since it is a 0 investment compared to the
> opposite"
> Its unfortunate that you feel this way but many people do not share
> your opinion.
>

 And a lot do so when a project is 50-50 it is time to act.

 Incrementally kind of means never (makes 4 months and nothing really
 changed in PRs and habits, gradle maintener(s) are still alone)


>
>
> On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> @Valentyn: concretely any user can PR and be part of that process so
>> anyone can do it wrong (me first)
>> @Luskasz, Hennking: fine but what do we do? last months prooved beam
>> cant maintain 2 systems, easier with that state is then to drop gradle
>> since it is a 0 investment compared to the opposite
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-03-22 17:24 GMT+01:00 Lukasz Cwik :
>>
>>> Romain, from the previous discussions several people agreed that
>>> running a fixit that migrated Maven to Gradle over a 1 or 2 day period 
>>> was
>>> worthwhile but there was nobody in the community with the time 
>>> commitment
>>> to organize and run it so the status quo plan remained where the Gradle
>>> migration will happen incrementally.
>>>
>>>
>>> On Thu, Mar 22, 2018 

Re: Flink Runner display transform bug

2018-03-22 Thread Aljoscha Krettek
I think this might be a problem in how we set display names? Would you mind 
opening a Jira issue for that?

> On 22. Mar 2018, at 04:27, Alexey Diomin  wrote:
> 
> Hi
> 
> I have this display in 2.3.0 and 2.4.0 versions.
> Main problem that RawParDo doesn't provide correctly name.
> 
> As hotfix I did custom build with small fix, but I don't sure in his 
> correctness. 
> 
> @Override
> protected String getKindString() {
>   return protoTransform.getUniqueName();
> }
> 
> Can anybody check this problem on another runners?
> 
> Thanks,
> Alexey
> 



[Discuss] On Jira and assignation of work

2018-03-22 Thread Pablo Estrada
Hello all,
this email is to follow up with specific questions about the new
contributor experience. Specifically when it comes to the usage of JIRA and
the assignation of work.
To look at the basic ideas, see my previous doc[1].
The questions to discuss are:

1. What should a component owner do when a new JIRA is created and assigned
to them?
2. What should a new contributor do before starting to work on a particular
work item?
3. What parts of the JIRA[2] section in the contributor guide should be
improved?

Thanks!
-P.

1 -
https://docs.google.com/document/d/1WaK39qrrG_P50FOMHifJhrdHZYmjOOf8MgoObwCZI50/edit

2 -
https://beam.apache.org/contribute/contribution-guide/#jira-issue-tracker
-- 
Got feedback? go/pabloem-feedback


Re: Gradle status

2018-03-22 Thread Romain Manni-Bucau
Le 22 mars 2018 18:49, "Dan Halperin"  a écrit :

It seems that a few groups are talking past each other.

* A sizable contingent is interested in a move to Gradle -- it shows
promise, but the work is incomplete.
* Another contingent noticing the large burden of maintaining multiple
build systems. FWICT, both test suites have been broken quite a lot
recently, mainly the gradle ones, which is a cost to the community. This is
creating a barrier to entry for new contributors – especially those who
don't work in the same room or do their primary development in a different
repository.

I don't see this situation being resolved to anyone's satisfaction until
there's only one build system left. The onus is clearly on the Gradle
promoters to finish the work.

Luke made a suggestion 2.5 months ago that we should have a process vote if
this situation is untenable. It seems like we're there.


Yes but beam voted to move to gradle so we should but we shouldnt maintain
2 build systems for more than 2 months (weway overpassed that) and
therefore the vote should be cancelled or validated by an action.

I understand you want gradle but you dont want to pay the cost to move to
gradle, it doesnt work for the project do please another option
(rollbacking gradle or removing maven, all other options are negative for
the project and a pain for committers and contributors whatever you think).



Thanks,
Dan

On Thu, Mar 22, 2018 at 10:30 AM, Romain Manni-Bucau 
wrote:

> Ok so to be clear for any contributor (which is the goal of this thread):
> maven is still the main build system and no need to maintain gradle in PR
> then until beam switches.
>
> Im more than fine with that.
>
> Le 22 mars 2018 18:22, "Alan Myrvold"  a écrit :
>
>> I think the investment in gradle is worthwhile, and incrementally we will
>> continue to make progress. From what I've seem, gradle is a good fit for
>> this project and a path to a faster, more reliable build system.
>>
>> pull/4812  creates the release
>> artifacts, although it is not hooked up yet with authentication.
>>
>> I expect to help make incremental progress over the next month converting
>> some of the integration tests, but welcome incremental improvements from
>> others.
>>
>>
>>
>> On Thu, Mar 22, 2018 at 9:57 AM Romain Manni-Bucau 
>> wrote:
>>
>>>
>>>
>>> 2018-03-22 17:45 GMT+01:00 Lukasz Cwik :
>>>
 what do we do? "Gradle migration will happen incrementally."

 "last months prooved beam cant maintain 2 systems, easier with that
 state is then to drop gradle since it is a 0 investment compared to the
 opposite"
 Its unfortunate that you feel this way but many people do not share
 your opinion.

>>>
>>> And a lot do so when a project is 50-50 it is time to act.
>>>
>>> Incrementally kind of means never (makes 4 months and nothing really
>>> changed in PRs and habits, gradle maintener(s) are still alone)
>>>
>>>


 On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> @Valentyn: concretely any user can PR and be part of that process so
> anyone can do it wrong (me first)
> @Luskasz, Hennking: fine but what do we do? last months prooved beam
> cant maintain 2 systems, easier with that state is then to drop gradle
> since it is a 0 investment compared to the opposite
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-03-22 17:24 GMT+01:00 Lukasz Cwik :
>
>> Romain, from the previous discussions several people agreed that
>> running a fixit that migrated Maven to Gradle over a 1 or 2 day period 
>> was
>> worthwhile but there was nobody in the community with the time commitment
>> to organize and run it so the status quo plan remained where the Gradle
>> migration will happen incrementally.
>>
>>
>> On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde 
>> wrote:
>>
>>> My understanding was the same as Ismaël's. I don't think breaking
>>> the build with a large known gaps (but not fully known cost) is 
>>> practical.
>>> Also, most items in the jira are not even assigned yet.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Not really Ismaël, this thread was about to do it at once and have
 1 day to fix it all.

 As mentionned at the very beginning nobody maintains the 2 system
 so 

Re: Gradle status

2018-03-22 Thread Dan Halperin
It seems that a few groups are talking past each other.

* A sizable contingent is interested in a move to Gradle -- it shows
promise, but the work is incomplete.
* Another contingent noticing the large burden of maintaining multiple
build systems. FWICT, both test suites have been broken quite a lot
recently, mainly the gradle ones, which is a cost to the community. This is
creating a barrier to entry for new contributors – especially those who
don't work in the same room or do their primary development in a different
repository.

I don't see this situation being resolved to anyone's satisfaction until
there's only one build system left. The onus is clearly on the Gradle
promoters to finish the work.

Luke made a suggestion 2.5 months ago that we should have a process vote if
this situation is untenable. It seems like we're there.

Thanks,
Dan

On Thu, Mar 22, 2018 at 10:30 AM, Romain Manni-Bucau 
wrote:

> Ok so to be clear for any contributor (which is the goal of this thread):
> maven is still the main build system and no need to maintain gradle in PR
> then until beam switches.
>
> Im more than fine with that.
>
> Le 22 mars 2018 18:22, "Alan Myrvold"  a écrit :
>
>> I think the investment in gradle is worthwhile, and incrementally we will
>> continue to make progress. From what I've seem, gradle is a good fit for
>> this project and a path to a faster, more reliable build system.
>>
>> pull/4812  creates the release
>> artifacts, although it is not hooked up yet with authentication.
>>
>> I expect to help make incremental progress over the next month converting
>> some of the integration tests, but welcome incremental improvements from
>> others.
>>
>>
>>
>> On Thu, Mar 22, 2018 at 9:57 AM Romain Manni-Bucau 
>> wrote:
>>
>>>
>>>
>>> 2018-03-22 17:45 GMT+01:00 Lukasz Cwik :
>>>
 what do we do? "Gradle migration will happen incrementally."

 "last months prooved beam cant maintain 2 systems, easier with that
 state is then to drop gradle since it is a 0 investment compared to the
 opposite"
 Its unfortunate that you feel this way but many people do not share
 your opinion.

>>>
>>> And a lot do so when a project is 50-50 it is time to act.
>>>
>>> Incrementally kind of means never (makes 4 months and nothing really
>>> changed in PRs and habits, gradle maintener(s) are still alone)
>>>
>>>


 On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> @Valentyn: concretely any user can PR and be part of that process so
> anyone can do it wrong (me first)
> @Luskasz, Hennking: fine but what do we do? last months prooved beam
> cant maintain 2 systems, easier with that state is then to drop gradle
> since it is a 0 investment compared to the opposite
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-03-22 17:24 GMT+01:00 Lukasz Cwik :
>
>> Romain, from the previous discussions several people agreed that
>> running a fixit that migrated Maven to Gradle over a 1 or 2 day period 
>> was
>> worthwhile but there was nobody in the community with the time commitment
>> to organize and run it so the status quo plan remained where the Gradle
>> migration will happen incrementally.
>>
>>
>> On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde 
>> wrote:
>>
>>> My understanding was the same as Ismaël's. I don't think breaking
>>> the build with a large known gaps (but not fully known cost) is 
>>> practical.
>>> Also, most items in the jira are not even assigned yet.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Not really Ismaël, this thread was about to do it at once and have
 1 day to fix it all.

 As mentionned at the very beginning nobody maintains the 2 system
 so it must stop after months so either we drop maven or gradle *at 
 once*
 or we keep a state where each dev does what he wants and the build
 system just doesn't work.

 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :

> I don't think that removing all maven descriptors was the expected
> path, no ? Or even a good idea at this moment.
>
> I understood that what we were going to do was to replace
> incrementally the CI until we cover the whole maven functionality

Re: Gradle status

2018-03-22 Thread Lukasz Cwik
Romain, that is incorrect. Contributors are to maintain both systems until
Gradle replaces that functionality in Maven and then it only needs to be
maintained in Gradle.


On Thu, Mar 22, 2018 at 10:30 AM Romain Manni-Bucau 
wrote:

> Ok so to be clear for any contributor (which is the goal of this thread):
> maven is still the main build system and no need to maintain gradle in PR
> then until beam switches.
>
> Im more than fine with that.
>
> Le 22 mars 2018 18:22, "Alan Myrvold"  a écrit :
>
>> I think the investment in gradle is worthwhile, and incrementally we will
>> continue to make progress. From what I've seem, gradle is a good fit for
>> this project and a path to a faster, more reliable build system.
>>
>> pull/4812  creates the release
>> artifacts, although it is not hooked up yet with authentication.
>>
>> I expect to help make incremental progress over the next month converting
>> some of the integration tests, but welcome incremental improvements from
>> others.
>>
>>
>>
>> On Thu, Mar 22, 2018 at 9:57 AM Romain Manni-Bucau 
>> wrote:
>>
>>>
>>>
>>> 2018-03-22 17:45 GMT+01:00 Lukasz Cwik :
>>>
 what do we do? "Gradle migration will happen incrementally."

 "last months prooved beam cant maintain 2 systems, easier with that
 state is then to drop gradle since it is a 0 investment compared to the
 opposite"
 Its unfortunate that you feel this way but many people do not share
 your opinion.

>>>
>>> And a lot do so when a project is 50-50 it is time to act.
>>>
>>> Incrementally kind of means never (makes 4 months and nothing really
>>> changed in PRs and habits, gradle maintener(s) are still alone)
>>>
>>>


 On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> @Valentyn: concretely any user can PR and be part of that process so
> anyone can do it wrong (me first)
> @Luskasz, Hennking: fine but what do we do? last months prooved beam
> cant maintain 2 systems, easier with that state is then to drop gradle
> since it is a 0 investment compared to the opposite
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-03-22 17:24 GMT+01:00 Lukasz Cwik :
>
>> Romain, from the previous discussions several people agreed that
>> running a fixit that migrated Maven to Gradle over a 1 or 2 day period 
>> was
>> worthwhile but there was nobody in the community with the time commitment
>> to organize and run it so the status quo plan remained where the Gradle
>> migration will happen incrementally.
>>
>>
>> On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde 
>> wrote:
>>
>>> My understanding was the same as Ismaël's. I don't think breaking
>>> the build with a large known gaps (but not fully known cost) is 
>>> practical.
>>> Also, most items in the jira are not even assigned yet.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Not really Ismaël, this thread was about to do it at once and have
 1 day to fix it all.

 As mentionned at the very beginning nobody maintains the 2 system
 so it must stop after months so either we drop maven or gradle *at 
 once*
 or we keep a state where each dev does what he wants and the build
 system just doesn't work.

 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :

> I don't think that removing all maven descriptors was the expected
> path, no ? Or even a good idea at this moment.
>
> I understood that what we were going to do was to replace
> incrementally the CI until we cover the whole maven functionality
> and
> then remove it, from looking at the JIRA ticket
> https://issues.apache.org/jira/browse/BEAM-3249 we are still far
> from
> covering the complete maven functionality in particular for the
> release part that could be the biggest pain point.
>
>
> On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
>  wrote:
> > hey guys,
> >
> > 2.4 is out, do we plan to drop all maven descriptors tomorrow or
> on monday?
> >
> >
> > Romain Manni-Bucau
> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
> >

Re: Jenkins wait times

2018-03-22 Thread Lukasz Cwik
The maximum passing runtime was 70 mins in Gradle, so it seems reasonable
to set it to 90 mins.
For Maven, its about 130 mins so setting it to 150 mins is also reasonable.

It turns out that ~1/8 in builds were aborted due to taking too long in
both Maven / Java due to tests getting stuck.
Would be worthwhile to set a maximum runtime per test to prevent this.

Feel free to open a PR and send it my way.


On Thu, Mar 22, 2018 at 10:14 AM Udi Meiri  wrote:

> Hi,
> I've been seeing increased wait times on Jenkins. It's frustrating to wait
> 8h 
> for a build, or 4h
> 
> for it just to schedule.
>
> Data point:
>
> https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/buildTimeTrend
> These builds seem to be timing out after 4 hours. Can we reduce the
> timeout to just 1 hour?
>
>
>


Re: Gradle status

2018-03-22 Thread Romain Manni-Bucau
Ok so to be clear for any contributor (which is the goal of this thread):
maven is still the main build system and no need to maintain gradle in PR
then until beam switches.

Im more than fine with that.

Le 22 mars 2018 18:22, "Alan Myrvold"  a écrit :

> I think the investment in gradle is worthwhile, and incrementally we will
> continue to make progress. From what I've seem, gradle is a good fit for
> this project and a path to a faster, more reliable build system.
>
> pull/4812  creates the release
> artifacts, although it is not hooked up yet with authentication.
>
> I expect to help make incremental progress over the next month converting
> some of the integration tests, but welcome incremental improvements from
> others.
>
>
>
> On Thu, Mar 22, 2018 at 9:57 AM Romain Manni-Bucau 
> wrote:
>
>>
>>
>> 2018-03-22 17:45 GMT+01:00 Lukasz Cwik :
>>
>>> what do we do? "Gradle migration will happen incrementally."
>>>
>>> "last months prooved beam cant maintain 2 systems, easier with that
>>> state is then to drop gradle since it is a 0 investment compared to the
>>> opposite"
>>> Its unfortunate that you feel this way but many people do not share your
>>> opinion.
>>>
>>
>> And a lot do so when a project is 50-50 it is time to act.
>>
>> Incrementally kind of means never (makes 4 months and nothing really
>> changed in PRs and habits, gradle maintener(s) are still alone)
>>
>>
>>>
>>>
>>> On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 @Valentyn: concretely any user can PR and be part of that process so
 anyone can do it wrong (me first)
 @Luskasz, Hennking: fine but what do we do? last months prooved beam
 cant maintain 2 systems, easier with that state is then to drop gradle
 since it is a 0 investment compared to the opposite


 Romain Manni-Bucau
 @rmannibucau  |  Blog
  | Old Blog
  | Github
  | LinkedIn
  | Book
 

 2018-03-22 17:24 GMT+01:00 Lukasz Cwik :

> Romain, from the previous discussions several people agreed that
> running a fixit that migrated Maven to Gradle over a 1 or 2 day period was
> worthwhile but there was nobody in the community with the time commitment
> to organize and run it so the status quo plan remained where the Gradle
> migration will happen incrementally.
>
>
> On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde 
> wrote:
>
>> My understanding was the same as Ismaël's. I don't think breaking the
>> build with a large known gaps (but not fully known cost) is practical.
>> Also, most items in the jira are not even assigned yet.
>>
>>
>> On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Not really Ismaël, this thread was about to do it at once and have 1
>>> day to fix it all.
>>>
>>> As mentionned at the very beginning nobody maintains the 2 system so
>>> it must stop after months so either we drop maven or gradle *at once*
>>> or we keep a state where each dev does what he wants and the build
>>> system just doesn't work.
>>>
>>> 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :
>>>
 I don't think that removing all maven descriptors was the expected
 path, no ? Or even a good idea at this moment.

 I understood that what we were going to do was to replace
 incrementally the CI until we cover the whole maven functionality
 and
 then remove it, from looking at the JIRA ticket
 https://issues.apache.org/jira/browse/BEAM-3249 we are still far
 from
 covering the complete maven functionality in particular for the
 release part that could be the biggest pain point.


 On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
  wrote:
 > hey guys,
 >
 > 2.4 is out, do we plan to drop all maven descriptors tomorrow or
 on monday?
 >
 >
 > Romain Manni-Bucau
 > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
 >
 > 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
 >>
 >> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik 
 wrote:
 >>>
 >>> Based upon your description it seems as though you would rather
 have a
 >>> way to run existing postcommits without it impacting
 dashboard/health
 >>> 

Re: Gradle status

2018-03-22 Thread Alan Myrvold
I think the investment in gradle is worthwhile, and incrementally we will
continue to make progress. From what I've seem, gradle is a good fit for
this project and a path to a faster, more reliable build system.

pull/4812  creates the release
artifacts, although it is not hooked up yet with authentication.

I expect to help make incremental progress over the next month converting
some of the integration tests, but welcome incremental improvements from
others.



On Thu, Mar 22, 2018 at 9:57 AM Romain Manni-Bucau 
wrote:

>
>
> 2018-03-22 17:45 GMT+01:00 Lukasz Cwik :
>
>> what do we do? "Gradle migration will happen incrementally."
>>
>> "last months prooved beam cant maintain 2 systems, easier with that
>> state is then to drop gradle since it is a 0 investment compared to the
>> opposite"
>> Its unfortunate that you feel this way but many people do not share your
>> opinion.
>>
>
> And a lot do so when a project is 50-50 it is time to act.
>
> Incrementally kind of means never (makes 4 months and nothing really
> changed in PRs and habits, gradle maintener(s) are still alone)
>
>
>>
>>
>> On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau 
>> wrote:
>>
>>> @Valentyn: concretely any user can PR and be part of that process so
>>> anyone can do it wrong (me first)
>>> @Luskasz, Hennking: fine but what do we do? last months prooved beam
>>> cant maintain 2 systems, easier with that state is then to drop gradle
>>> since it is a 0 investment compared to the opposite
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>>  | LinkedIn
>>>  | Book
>>> 
>>>
>>> 2018-03-22 17:24 GMT+01:00 Lukasz Cwik :
>>>
 Romain, from the previous discussions several people agreed that
 running a fixit that migrated Maven to Gradle over a 1 or 2 day period was
 worthwhile but there was nobody in the community with the time commitment
 to organize and run it so the status quo plan remained where the Gradle
 migration will happen incrementally.


 On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde 
 wrote:

> My understanding was the same as Ismaël's. I don't think breaking the
> build with a large known gaps (but not fully known cost) is practical.
> Also, most items in the jira are not even assigned yet.
>
>
> On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> Not really Ismaël, this thread was about to do it at once and have 1
>> day to fix it all.
>>
>> As mentionned at the very beginning nobody maintains the 2 system so
>> it must stop after months so either we drop maven or gradle *at once*
>> or we keep a state where each dev does what he wants and the build
>> system just doesn't work.
>>
>> 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :
>>
>>> I don't think that removing all maven descriptors was the expected
>>> path, no ? Or even a good idea at this moment.
>>>
>>> I understood that what we were going to do was to replace
>>> incrementally the CI until we cover the whole maven functionality and
>>> then remove it, from looking at the JIRA ticket
>>> https://issues.apache.org/jira/browse/BEAM-3249 we are still far
>>> from
>>> covering the complete maven functionality in particular for the
>>> release part that could be the biggest pain point.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
>>>  wrote:
>>> > hey guys,
>>> >
>>> > 2.4 is out, do we plan to drop all maven descriptors tomorrow or
>>> on monday?
>>> >
>>> >
>>> > Romain Manni-Bucau
>>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>> >
>>> > 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
>>> >>
>>> >> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik 
>>> wrote:
>>> >>>
>>> >>> Based upon your description it seems as though you would rather
>>> have a
>>> >>> way to run existing postcommits without it impacting
>>> dashboard/health
>>> >>> stats/notifications/ (We have just run the PostCommits on
>>> PRs for
>>> >>> additional validation (like upgrading the Dataflow container
>>> image)).
>>> >>
>>> >>
>>> >> Yes, that is exactly what I have described.
>>> >>
>>> >>> I don't think that keeping the current Java PreCommit as a proxy
>>> for the
>>> >>> the Java PostCommit is the right way to go but I also don't have
>>> 

Jenkins wait times

2018-03-22 Thread Udi Meiri
Hi,
I've been seeing increased wait times on Jenkins. It's frustrating to wait
8h 
for a build, or 4h

for it just to schedule.

Data point:
https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/buildTimeTrend
These builds seem to be timing out after 4 hours. Can we reduce the timeout
to just 1 hour?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Gradle status

2018-03-22 Thread Romain Manni-Bucau
2018-03-22 17:45 GMT+01:00 Lukasz Cwik :

> what do we do? "Gradle migration will happen incrementally."
>
> "last months prooved beam cant maintain 2 systems, easier with that state
> is then to drop gradle since it is a 0 investment compared to the opposite"
> Its unfortunate that you feel this way but many people do not share your
> opinion.
>

And a lot do so when a project is 50-50 it is time to act.

Incrementally kind of means never (makes 4 months and nothing really
changed in PRs and habits, gradle maintener(s) are still alone)


>
>
> On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau 
> wrote:
>
>> @Valentyn: concretely any user can PR and be part of that process so
>> anyone can do it wrong (me first)
>> @Luskasz, Hennking: fine but what do we do? last months prooved beam cant
>> maintain 2 systems, easier with that state is then to drop gradle since it
>> is a 0 investment compared to the opposite
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-03-22 17:24 GMT+01:00 Lukasz Cwik :
>>
>>> Romain, from the previous discussions several people agreed that running
>>> a fixit that migrated Maven to Gradle over a 1 or 2 day period was
>>> worthwhile but there was nobody in the community with the time commitment
>>> to organize and run it so the status quo plan remained where the Gradle
>>> migration will happen incrementally.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde 
>>> wrote:
>>>
 My understanding was the same as Ismaël's. I don't think breaking the
 build with a large known gaps (but not fully known cost) is practical.
 Also, most items in the jira are not even assigned yet.


 On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Not really Ismaël, this thread was about to do it at once and have 1
> day to fix it all.
>
> As mentionned at the very beginning nobody maintains the 2 system so
> it must stop after months so either we drop maven or gradle *at once*
> or we keep a state where each dev does what he wants and the build
> system just doesn't work.
>
> 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :
>
>> I don't think that removing all maven descriptors was the expected
>> path, no ? Or even a good idea at this moment.
>>
>> I understood that what we were going to do was to replace
>> incrementally the CI until we cover the whole maven functionality and
>> then remove it, from looking at the JIRA ticket
>> https://issues.apache.org/jira/browse/BEAM-3249 we are still far from
>> covering the complete maven functionality in particular for the
>> release part that could be the biggest pain point.
>>
>>
>> On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
>>  wrote:
>> > hey guys,
>> >
>> > 2.4 is out, do we plan to drop all maven descriptors tomorrow or on
>> monday?
>> >
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>> >
>> > 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
>> >>
>> >> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik 
>> wrote:
>> >>>
>> >>> Based upon your description it seems as though you would rather
>> have a
>> >>> way to run existing postcommits without it impacting
>> dashboard/health
>> >>> stats/notifications/ (We have just run the PostCommits on PRs
>> for
>> >>> additional validation (like upgrading the Dataflow container
>> image)).
>> >>
>> >>
>> >> Yes, that is exactly what I have described.
>> >>
>> >>> I don't think that keeping the current Java PreCommit as a proxy
>> for the
>> >>> the Java PostCommit is the right way to go but I also don't have
>> the time to
>> >>> implement what your actually asking for.
>> >>
>> >>
>> >> Mostly I thought this might be very easy based on the fact that
>> they are
>> >> nearly identical. If not, oh well.
>> >>
>> >> Kenn
>> >>
>> >>
>> >>> It seems more likely that migrating the PostCommit to Gradle will
>> be less
>> >>> work then adding the functionality but your argument where the
>> PreCommit is
>> >>> a proxy for the Java PostCommit also applies to the
>> ValidatesRunner
>> >>> PostCommits and so forth requiring even more migration to happen
>> before you
>> >>> don't have to worry about maintaining Maven/breaking post commits.
>> >>>

Re: Gradle status

2018-03-22 Thread Lukasz Cwik
what do we do? "Gradle migration will happen incrementally."

"last months prooved beam cant maintain 2 systems, easier with that state
is then to drop gradle since it is a 0 investment compared to the opposite"
Its unfortunate that you feel this way but many people do not share your
opinion.


On Thu, Mar 22, 2018 at 9:32 AM Romain Manni-Bucau 
wrote:

> @Valentyn: concretely any user can PR and be part of that process so
> anyone can do it wrong (me first)
> @Luskasz, Hennking: fine but what do we do? last months prooved beam cant
> maintain 2 systems, easier with that state is then to drop gradle since it
> is a 0 investment compared to the opposite
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-03-22 17:24 GMT+01:00 Lukasz Cwik :
>
>> Romain, from the previous discussions several people agreed that running
>> a fixit that migrated Maven to Gradle over a 1 or 2 day period was
>> worthwhile but there was nobody in the community with the time commitment
>> to organize and run it so the status quo plan remained where the Gradle
>> migration will happen incrementally.
>>
>>
>> On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde  wrote:
>>
>>> My understanding was the same as Ismaël's. I don't think breaking the
>>> build with a large known gaps (but not fully known cost) is practical.
>>> Also, most items in the jira are not even assigned yet.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Not really Ismaël, this thread was about to do it at once and have 1
 day to fix it all.

 As mentionned at the very beginning nobody maintains the 2 system so it
 must stop after months so either we drop maven or gradle *at once*
 or we keep a state where each dev does what he wants and the build
 system just doesn't work.

 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :

> I don't think that removing all maven descriptors was the expected
> path, no ? Or even a good idea at this moment.
>
> I understood that what we were going to do was to replace
> incrementally the CI until we cover the whole maven functionality and
> then remove it, from looking at the JIRA ticket
> https://issues.apache.org/jira/browse/BEAM-3249 we are still far from
> covering the complete maven functionality in particular for the
> release part that could be the biggest pain point.
>
>
> On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
>  wrote:
> > hey guys,
> >
> > 2.4 is out, do we plan to drop all maven descriptors tomorrow or on
> monday?
> >
> >
> > Romain Manni-Bucau
> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
> >
> > 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
> >>
> >> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik 
> wrote:
> >>>
> >>> Based upon your description it seems as though you would rather
> have a
> >>> way to run existing postcommits without it impacting
> dashboard/health
> >>> stats/notifications/ (We have just run the PostCommits on PRs
> for
> >>> additional validation (like upgrading the Dataflow container
> image)).
> >>
> >>
> >> Yes, that is exactly what I have described.
> >>
> >>> I don't think that keeping the current Java PreCommit as a proxy
> for the
> >>> the Java PostCommit is the right way to go but I also don't have
> the time to
> >>> implement what your actually asking for.
> >>
> >>
> >> Mostly I thought this might be very easy based on the fact that
> they are
> >> nearly identical. If not, oh well.
> >>
> >> Kenn
> >>
> >>
> >>> It seems more likely that migrating the PostCommit to Gradle will
> be less
> >>> work then adding the functionality but your argument where the
> PreCommit is
> >>> a proxy for the Java PostCommit also applies to the ValidatesRunner
> >>> PostCommits and so forth requiring even more migration to happen
> before you
> >>> don't have to worry about maintaining Maven/breaking post commits.
> >>>
> >>> I'm fine with leaving both the Java/Gradle PreCommits running for
> now and
> >>> hopefully as more of the PostCommits are migrated off we will be
> able to
> >>> remove it.
> >>>
> >>> On Fri, Mar 9, 2018 at 11:39 AM, Kenneth Knowles 
> wrote:
> 
>  Separate history (for easy dashboarding, health stats, etc) and
>  

Re: Gradle status

2018-03-22 Thread Romain Manni-Bucau
@Valentyn: concretely any user can PR and be part of that process so anyone
can do it wrong (me first)
@Luskasz, Hennking: fine but what do we do? last months prooved beam cant
maintain 2 systems, easier with that state is then to drop gradle since it
is a 0 investment compared to the opposite


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-03-22 17:24 GMT+01:00 Lukasz Cwik :

> Romain, from the previous discussions several people agreed that running a
> fixit that migrated Maven to Gradle over a 1 or 2 day period was worthwhile
> but there was nobody in the community with the time commitment to organize
> and run it so the status quo plan remained where the Gradle migration will
> happen incrementally.
>
>
> On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde  wrote:
>
>> My understanding was the same as Ismaël's. I don't think breaking the
>> build with a large known gaps (but not fully known cost) is practical.
>> Also, most items in the jira are not even assigned yet.
>>
>>
>> On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau 
>> wrote:
>>
>>> Not really Ismaël, this thread was about to do it at once and have 1 day
>>> to fix it all.
>>>
>>> As mentionned at the very beginning nobody maintains the 2 system so it
>>> must stop after months so either we drop maven or gradle *at once*
>>> or we keep a state where each dev does what he wants and the build
>>> system just doesn't work.
>>>
>>> 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :
>>>
 I don't think that removing all maven descriptors was the expected
 path, no ? Or even a good idea at this moment.

 I understood that what we were going to do was to replace
 incrementally the CI until we cover the whole maven functionality and
 then remove it, from looking at the JIRA ticket
 https://issues.apache.org/jira/browse/BEAM-3249 we are still far from
 covering the complete maven functionality in particular for the
 release part that could be the biggest pain point.


 On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
  wrote:
 > hey guys,
 >
 > 2.4 is out, do we plan to drop all maven descriptors tomorrow or on
 monday?
 >
 >
 > Romain Manni-Bucau
 > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
 >
 > 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
 >>
 >> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik 
 wrote:
 >>>
 >>> Based upon your description it seems as though you would rather
 have a
 >>> way to run existing postcommits without it impacting
 dashboard/health
 >>> stats/notifications/ (We have just run the PostCommits on PRs
 for
 >>> additional validation (like upgrading the Dataflow container
 image)).
 >>
 >>
 >> Yes, that is exactly what I have described.
 >>
 >>> I don't think that keeping the current Java PreCommit as a proxy
 for the
 >>> the Java PostCommit is the right way to go but I also don't have
 the time to
 >>> implement what your actually asking for.
 >>
 >>
 >> Mostly I thought this might be very easy based on the fact that they
 are
 >> nearly identical. If not, oh well.
 >>
 >> Kenn
 >>
 >>
 >>> It seems more likely that migrating the PostCommit to Gradle will
 be less
 >>> work then adding the functionality but your argument where the
 PreCommit is
 >>> a proxy for the Java PostCommit also applies to the ValidatesRunner
 >>> PostCommits and so forth requiring even more migration to happen
 before you
 >>> don't have to worry about maintaining Maven/breaking post commits.
 >>>
 >>> I'm fine with leaving both the Java/Gradle PreCommits running for
 now and
 >>> hopefully as more of the PostCommits are migrated off we will be
 able to
 >>> remove it.
 >>>
 >>> On Fri, Mar 9, 2018 at 11:39 AM, Kenneth Knowles 
 wrote:
 
  Separate history (for easy dashboarding, health stats, etc) and
  notification (email to dev@ for postcommits, nothing for
 precommits) for pre
  & post commit targets.
 
  A post commit failure is always a problem to be triaged at high
  priority, while a precommit failure is just a natural occurrence.
 
  On Fri, Mar 9, 2018 at 11:33 AM Lukasz Cwik 
 wrote:
 >
 > Ken, I'm probably not seeing something but how does using the
 PreCommit
 > as a proxy improve upon just 

Re: Gradle status

2018-03-22 Thread Lukasz Cwik
Romain, from the previous discussions several people agreed that running a
fixit that migrated Maven to Gradle over a 1 or 2 day period was worthwhile
but there was nobody in the community with the time commitment to organize
and run it so the status quo plan remained where the Gradle migration will
happen incrementally.


On Thu, Mar 22, 2018 at 8:53 AM Henning Rohde  wrote:

> My understanding was the same as Ismaël's. I don't think breaking the
> build with a large known gaps (but not fully known cost) is practical.
> Also, most items in the jira are not even assigned yet.
>
>
> On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau 
> wrote:
>
>> Not really Ismaël, this thread was about to do it at once and have 1 day
>> to fix it all.
>>
>> As mentionned at the very beginning nobody maintains the 2 system so it
>> must stop after months so either we drop maven or gradle *at once*
>> or we keep a state where each dev does what he wants and the build system
>> just doesn't work.
>>
>> 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :
>>
>>> I don't think that removing all maven descriptors was the expected
>>> path, no ? Or even a good idea at this moment.
>>>
>>> I understood that what we were going to do was to replace
>>> incrementally the CI until we cover the whole maven functionality and
>>> then remove it, from looking at the JIRA ticket
>>> https://issues.apache.org/jira/browse/BEAM-3249 we are still far from
>>> covering the complete maven functionality in particular for the
>>> release part that could be the biggest pain point.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
>>>  wrote:
>>> > hey guys,
>>> >
>>> > 2.4 is out, do we plan to drop all maven descriptors tomorrow or on
>>> monday?
>>> >
>>> >
>>> > Romain Manni-Bucau
>>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>> >
>>> > 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
>>> >>
>>> >> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik  wrote:
>>> >>>
>>> >>> Based upon your description it seems as though you would rather have
>>> a
>>> >>> way to run existing postcommits without it impacting dashboard/health
>>> >>> stats/notifications/ (We have just run the PostCommits on PRs for
>>> >>> additional validation (like upgrading the Dataflow container image)).
>>> >>
>>> >>
>>> >> Yes, that is exactly what I have described.
>>> >>
>>> >>> I don't think that keeping the current Java PreCommit as a proxy for
>>> the
>>> >>> the Java PostCommit is the right way to go but I also don't have the
>>> time to
>>> >>> implement what your actually asking for.
>>> >>
>>> >>
>>> >> Mostly I thought this might be very easy based on the fact that they
>>> are
>>> >> nearly identical. If not, oh well.
>>> >>
>>> >> Kenn
>>> >>
>>> >>
>>> >>> It seems more likely that migrating the PostCommit to Gradle will be
>>> less
>>> >>> work then adding the functionality but your argument where the
>>> PreCommit is
>>> >>> a proxy for the Java PostCommit also applies to the ValidatesRunner
>>> >>> PostCommits and so forth requiring even more migration to happen
>>> before you
>>> >>> don't have to worry about maintaining Maven/breaking post commits.
>>> >>>
>>> >>> I'm fine with leaving both the Java/Gradle PreCommits running for
>>> now and
>>> >>> hopefully as more of the PostCommits are migrated off we will be
>>> able to
>>> >>> remove it.
>>> >>>
>>> >>> On Fri, Mar 9, 2018 at 11:39 AM, Kenneth Knowles 
>>> wrote:
>>> 
>>>  Separate history (for easy dashboarding, health stats, etc) and
>>>  notification (email to dev@ for postcommits, nothing for
>>> precommits) for pre
>>>  & post commit targets.
>>> 
>>>  A post commit failure is always a problem to be triaged at high
>>>  priority, while a precommit failure is just a natural occurrence.
>>> 
>>>  On Fri, Mar 9, 2018 at 11:33 AM Lukasz Cwik 
>>> wrote:
>>> >
>>> > Ken, I'm probably not seeing something but how does using the
>>> PreCommit
>>> > as a proxy improve upon just running the post commit via the
>>> phrase it
>>> > already supports ('Run Java PostCommit')?
>>> >
>>> > On Fri, Mar 9, 2018 at 11:22 AM, Kenneth Knowles 
>>> > wrote:
>>> >>
>>> >> Indeed, we've already had the discussion a couple of times and I
>>> think
>>> >> the criteria are clearly met. Incremental progress is a good
>>> thing and we
>>> >> shouldn't block it.
>>> >>
>>> >> OTOH I see where Romain is coming from and I have a good example
>>> that
>>> >> supports a slightly different action. Consider
>>> >> https://github.com/apache/beam/pull/4740 which fixes some errors
>>> in how we
>>> >> use dependency mechanisms.
>>> >>
>>> >> This PR is green except that I need to fix some Maven pom slightly
>>> >> more. That is throwaway 

Re: Help with Dynamic writing

2018-03-22 Thread OrielResearch Eila Arich-Landkof
Thanks. Everything is working!
Eila

On Thu, Mar 22, 2018 at 3:38 AM, Chamikara Jayalath 
wrote:

>
>
> On Wed, Mar 21, 2018 at 1:57 PM OrielResearch Eila Arich-Landkof <
> e...@orielresearch.org> wrote:
>
>> yes. You were right, I had to put back the pc.
>> I am working on the partition function and try to debug it without
>> running a pipeline on dataflow (dataflow execution takes at list 8 minutes
>> for any size of data), based on the link: https://medium.com/google-
>> cloud/quickly-experiment-with-dataflow-3d5a0da8d8e9
>>
>> *my code & questions:*
>> # Is it possible to use string as partition name? if not, what would be
>> the simplest solution?
>> # generating another PCollection with the samples (index) and merge with
>> the data table? if yes,
>> # how would I extract that information when I am writing to text file
>> def partition_fn(element, num_partitions):
>>   return(element['sample'])
>>
>
> I believe partition function has to return an integer that is the
> partition number for a given element. Please see following code snippet for
> an example usage.
> https://github.com/apache/beam/blob/master/sdks/python/
> apache_beam/examples/snippets/snippets.py#L1155
>
> Also, that example shows how you can write resulting PCollections to text
> files.
>
>
>>
>> # debug input, no pipeline
>>
>> *d = [{u'sample': u'GSM2313641', u'SNRPCP14': 0},{u'sample':
>> u'GSM231', u'SNRPCP14': 0},{u'sample': u'GSM2312355', u'SNRPCP14': 0}]
>> | beam.Flatten()*
>>
>> *d*
>> # output:
>>
>> [(u'sample', u'GSM2312355'),
>>  (u'SNRPCP14', 0),
>>  (u'sample', u'GSM2313641'),
>>  (u'SNRPCP14', 0),
>>  (u'sample', u'GSM231'),
>>  (u'SNRPCP14', 0)]
>>
>>
>>
>> *d | beam.Partition(partition_fn,3)*
>> #output, error:
>>
>> TypeError: tuple indices must be integers, not str [while running 
>> 'Partition(CallableWrapperPartitionFn)/ParDo(ApplyPartitionFnFn)/ParDo(ApplyPartitionFnFn)']
>>
>>
> Please see above.
>
>
>>
>> *#writing to dynamic output - how do i extract the partition name that
>> the element is assigned to?*
>> *for i, partition in enumerate(partitions):*
>> *  # The partition path is using the partition str name*
>> *  pathi= str('gs://bucket/output/'+x["sample"]+'/')*
>> *  partition | label >> beam.io.WriteToText(pathi)*
>>
>>
>> Please also let me know if there is a better way to debug the code before
>> running on dataflow runner.
>>
>
> You can test code using DirectRunner before using DataflowRunner. Both
> runners should produce the same result.
>
>
>>
>> Many thanks,
>> Eila
>>
>>
>>
>> On Wed, Mar 21, 2018 at 12:43 PM, Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> On Wed, Mar 21, 2018 at 7:53 AM OrielResearch Eila Arich-Landkof <
>>> e...@orielresearch.org> wrote:
>>>
 Hi Cham,

 *all_data = pcollections | beam.Flatten()*

 fires an error:

 TypeError: 'Read' object is not iterable


 pcollections is the following list:

 [,
  ,
  ,
  ]



>>> Did you omit "p | " in "p | beam.io.Read" by any chance ? Not sure how
>>> you ended up with a list of Read PTransforms otherwise.
>>>
>>> Also, follow everything with a "p.run.wait_until_finish()" for pipeline
>>> to execute.
>>>
>>> Can you paste the code that you are running ?
>>>
>>>
 Based on the following, i converted the list to tuples 
 (tuple(*pcollections)) with the same error for tuple.*


 # Flatten takes a tuple of PCollection objects.# Returns a single 
 PCollection that contains all of the elements in the PCollection objects 
 in that tuple.merged = (
 (pcoll1, pcoll2, pcoll3)
 # A list of tuples can be "piped" directly into a Flatten transform.
 | beam.Flatten())


 Any advice?

 Many thanks,
 Eila


 On Wed, Mar 21, 2018 at 9:16 AM, OrielResearch Eila Arich-Landkof <
 e...@orielresearch.org> wrote:

> very helpful!!! i will keep you posted if I have any issue / question
> Best,
> Eila
>
>
> On Tue, Mar 20, 2018 at 5:08 PM, Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>>
>>
>> On Tue, Mar 20, 2018 at 12:54 PM OrielResearch Eila Arich-Landkof <
>> e...@orielresearch.org> wrote:
>>
>>> Hi Cham,
>>>
>>> Please see inline. If possible, code / pseudo code will help a lot.
>>> Thanks,
>>> Eila
>>>
>>> On Tue, Mar 20, 2018 at 1:15 PM, Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 Hi Eila,

 Please find my comments inline.

 On Tue, Mar 20, 2018 at 8:02 AM OrielResearch Eila Arich-Landkof <
 e...@orielresearch.org> wrote:

> Hello all,
>
> It was nice to meet you 

Re: Gradle status

2018-03-22 Thread Henning Rohde
My understanding was the same as Ismaël's. I don't think breaking the build
with a large known gaps (but not fully known cost) is practical. Also, most
items in the jira are not even assigned yet.


On Thu, Mar 22, 2018 at 8:03 AM Romain Manni-Bucau 
wrote:

> Not really Ismaël, this thread was about to do it at once and have 1 day
> to fix it all.
>
> As mentionned at the very beginning nobody maintains the 2 system so it
> must stop after months so either we drop maven or gradle *at once*
> or we keep a state where each dev does what he wants and the build system
> just doesn't work.
>
> 2018-03-22 15:42 GMT+01:00 Ismaël Mejía :
>
>> I don't think that removing all maven descriptors was the expected
>> path, no ? Or even a good idea at this moment.
>>
>> I understood that what we were going to do was to replace
>> incrementally the CI until we cover the whole maven functionality and
>> then remove it, from looking at the JIRA ticket
>> https://issues.apache.org/jira/browse/BEAM-3249 we are still far from
>> covering the complete maven functionality in particular for the
>> release part that could be the biggest pain point.
>>
>>
>> On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
>>  wrote:
>> > hey guys,
>> >
>> > 2.4 is out, do we plan to drop all maven descriptors tomorrow or on
>> monday?
>> >
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>> >
>> > 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
>> >>
>> >> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik  wrote:
>> >>>
>> >>> Based upon your description it seems as though you would rather have a
>> >>> way to run existing postcommits without it impacting dashboard/health
>> >>> stats/notifications/ (We have just run the PostCommits on PRs for
>> >>> additional validation (like upgrading the Dataflow container image)).
>> >>
>> >>
>> >> Yes, that is exactly what I have described.
>> >>
>> >>> I don't think that keeping the current Java PreCommit as a proxy for
>> the
>> >>> the Java PostCommit is the right way to go but I also don't have the
>> time to
>> >>> implement what your actually asking for.
>> >>
>> >>
>> >> Mostly I thought this might be very easy based on the fact that they
>> are
>> >> nearly identical. If not, oh well.
>> >>
>> >> Kenn
>> >>
>> >>
>> >>> It seems more likely that migrating the PostCommit to Gradle will be
>> less
>> >>> work then adding the functionality but your argument where the
>> PreCommit is
>> >>> a proxy for the Java PostCommit also applies to the ValidatesRunner
>> >>> PostCommits and so forth requiring even more migration to happen
>> before you
>> >>> don't have to worry about maintaining Maven/breaking post commits.
>> >>>
>> >>> I'm fine with leaving both the Java/Gradle PreCommits running for now
>> and
>> >>> hopefully as more of the PostCommits are migrated off we will be able
>> to
>> >>> remove it.
>> >>>
>> >>> On Fri, Mar 9, 2018 at 11:39 AM, Kenneth Knowles 
>> wrote:
>> 
>>  Separate history (for easy dashboarding, health stats, etc) and
>>  notification (email to dev@ for postcommits, nothing for
>> precommits) for pre
>>  & post commit targets.
>> 
>>  A post commit failure is always a problem to be triaged at high
>>  priority, while a precommit failure is just a natural occurrence.
>> 
>>  On Fri, Mar 9, 2018 at 11:33 AM Lukasz Cwik 
>> wrote:
>> >
>> > Ken, I'm probably not seeing something but how does using the
>> PreCommit
>> > as a proxy improve upon just running the post commit via the phrase
>> it
>> > already supports ('Run Java PostCommit')?
>> >
>> > On Fri, Mar 9, 2018 at 11:22 AM, Kenneth Knowles 
>> > wrote:
>> >>
>> >> Indeed, we've already had the discussion a couple of times and I
>> think
>> >> the criteria are clearly met. Incremental progress is a good thing
>> and we
>> >> shouldn't block it.
>> >>
>> >> OTOH I see where Romain is coming from and I have a good example
>> that
>> >> supports a slightly different action. Consider
>> >> https://github.com/apache/beam/pull/4740 which fixes some errors
>> in how we
>> >> use dependency mechanisms.
>> >>
>> >> This PR is green except that I need to fix some Maven pom slightly
>> >> more. That is throwaway work. I would love to just not have to do
>> it. But
>> >> removing the precommit does not actually make the PR OK to merge.
>> It would
>> >> cause postcommits to fail.
>> >>
>> >> We can hope such situations are rare. I think I tend to be hit by
>> this
>> >> more often than most, as I work with the project build health
>> quite a bit.
>> >>
>> >> Here is a proposal to support these things: instead of deleting the
>> >> job in #4814, move it to not run automatically but only via a
>> phrase. 

Re: Gradle status

2018-03-22 Thread Romain Manni-Bucau
Not really Ismaël, this thread was about to do it at once and have 1 day to
fix it all.

As mentionned at the very beginning nobody maintains the 2 system so it
must stop after months so either we drop maven or gradle *at once*
or we keep a state where each dev does what he wants and the build system
just doesn't work.

2018-03-22 15:42 GMT+01:00 Ismaël Mejía :

> I don't think that removing all maven descriptors was the expected
> path, no ? Or even a good idea at this moment.
>
> I understood that what we were going to do was to replace
> incrementally the CI until we cover the whole maven functionality and
> then remove it, from looking at the JIRA ticket
> https://issues.apache.org/jira/browse/BEAM-3249 we are still far from
> covering the complete maven functionality in particular for the
> release part that could be the biggest pain point.
>
>
> On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
>  wrote:
> > hey guys,
> >
> > 2.4 is out, do we plan to drop all maven descriptors tomorrow or on
> monday?
> >
> >
> > Romain Manni-Bucau
> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
> >
> > 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
> >>
> >> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik  wrote:
> >>>
> >>> Based upon your description it seems as though you would rather have a
> >>> way to run existing postcommits without it impacting dashboard/health
> >>> stats/notifications/ (We have just run the PostCommits on PRs for
> >>> additional validation (like upgrading the Dataflow container image)).
> >>
> >>
> >> Yes, that is exactly what I have described.
> >>
> >>> I don't think that keeping the current Java PreCommit as a proxy for
> the
> >>> the Java PostCommit is the right way to go but I also don't have the
> time to
> >>> implement what your actually asking for.
> >>
> >>
> >> Mostly I thought this might be very easy based on the fact that they are
> >> nearly identical. If not, oh well.
> >>
> >> Kenn
> >>
> >>
> >>> It seems more likely that migrating the PostCommit to Gradle will be
> less
> >>> work then adding the functionality but your argument where the
> PreCommit is
> >>> a proxy for the Java PostCommit also applies to the ValidatesRunner
> >>> PostCommits and so forth requiring even more migration to happen
> before you
> >>> don't have to worry about maintaining Maven/breaking post commits.
> >>>
> >>> I'm fine with leaving both the Java/Gradle PreCommits running for now
> and
> >>> hopefully as more of the PostCommits are migrated off we will be able
> to
> >>> remove it.
> >>>
> >>> On Fri, Mar 9, 2018 at 11:39 AM, Kenneth Knowles 
> wrote:
> 
>  Separate history (for easy dashboarding, health stats, etc) and
>  notification (email to dev@ for postcommits, nothing for precommits)
> for pre
>  & post commit targets.
> 
>  A post commit failure is always a problem to be triaged at high
>  priority, while a precommit failure is just a natural occurrence.
> 
>  On Fri, Mar 9, 2018 at 11:33 AM Lukasz Cwik  wrote:
> >
> > Ken, I'm probably not seeing something but how does using the
> PreCommit
> > as a proxy improve upon just running the post commit via the phrase
> it
> > already supports ('Run Java PostCommit')?
> >
> > On Fri, Mar 9, 2018 at 11:22 AM, Kenneth Knowles 
> > wrote:
> >>
> >> Indeed, we've already had the discussion a couple of times and I
> think
> >> the criteria are clearly met. Incremental progress is a good thing
> and we
> >> shouldn't block it.
> >>
> >> OTOH I see where Romain is coming from and I have a good example
> that
> >> supports a slightly different action. Consider
> >> https://github.com/apache/beam/pull/4740 which fixes some errors
> in how we
> >> use dependency mechanisms.
> >>
> >> This PR is green except that I need to fix some Maven pom slightly
> >> more. That is throwaway work. I would love to just not have to do
> it. But
> >> removing the precommit does not actually make the PR OK to merge.
> It would
> >> cause postcommits to fail.
> >>
> >> We can hope such situations are rare. I think I tend to be hit by
> this
> >> more often than most, as I work with the project build health quite
> a bit.
> >>
> >> Here is a proposal to support these things: instead of deleting the
> >> job in #4814, move it to not run automatically but only via a
> phrase. In
> >> fact, you could migrate it to be the manually-invoked version of the
> >> postcommit job as we've discussed a couple times. Then if someone
> is working
> >> on the build in something like #4740 they can invoke it manually.
> >>
> >> Kenn
> >>
> >> On Fri, Mar 9, 2018 at 10:25 AM Lukasz Cwik 
> wrote:
> >>>
> >>> Based upon the criteria that was 

Re: Gradle status

2018-03-22 Thread Ismaël Mejía
I don't think that removing all maven descriptors was the expected
path, no ? Or even a good idea at this moment.

I understood that what we were going to do was to replace
incrementally the CI until we cover the whole maven functionality and
then remove it, from looking at the JIRA ticket
https://issues.apache.org/jira/browse/BEAM-3249 we are still far from
covering the complete maven functionality in particular for the
release part that could be the biggest pain point.


On Thu, Mar 22, 2018 at 9:30 AM, Romain Manni-Bucau
 wrote:
> hey guys,
>
> 2.4 is out, do we plan to drop all maven descriptors tomorrow or on monday?
>
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>
> 2018-03-09 21:42 GMT+01:00 Kenneth Knowles :
>>
>> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik  wrote:
>>>
>>> Based upon your description it seems as though you would rather have a
>>> way to run existing postcommits without it impacting dashboard/health
>>> stats/notifications/ (We have just run the PostCommits on PRs for
>>> additional validation (like upgrading the Dataflow container image)).
>>
>>
>> Yes, that is exactly what I have described.
>>
>>> I don't think that keeping the current Java PreCommit as a proxy for the
>>> the Java PostCommit is the right way to go but I also don't have the time to
>>> implement what your actually asking for.
>>
>>
>> Mostly I thought this might be very easy based on the fact that they are
>> nearly identical. If not, oh well.
>>
>> Kenn
>>
>>
>>> It seems more likely that migrating the PostCommit to Gradle will be less
>>> work then adding the functionality but your argument where the PreCommit is
>>> a proxy for the Java PostCommit also applies to the ValidatesRunner
>>> PostCommits and so forth requiring even more migration to happen before you
>>> don't have to worry about maintaining Maven/breaking post commits.
>>>
>>> I'm fine with leaving both the Java/Gradle PreCommits running for now and
>>> hopefully as more of the PostCommits are migrated off we will be able to
>>> remove it.
>>>
>>> On Fri, Mar 9, 2018 at 11:39 AM, Kenneth Knowles  wrote:

 Separate history (for easy dashboarding, health stats, etc) and
 notification (email to dev@ for postcommits, nothing for precommits) for 
 pre
 & post commit targets.

 A post commit failure is always a problem to be triaged at high
 priority, while a precommit failure is just a natural occurrence.

 On Fri, Mar 9, 2018 at 11:33 AM Lukasz Cwik  wrote:
>
> Ken, I'm probably not seeing something but how does using the PreCommit
> as a proxy improve upon just running the post commit via the phrase it
> already supports ('Run Java PostCommit')?
>
> On Fri, Mar 9, 2018 at 11:22 AM, Kenneth Knowles 
> wrote:
>>
>> Indeed, we've already had the discussion a couple of times and I think
>> the criteria are clearly met. Incremental progress is a good thing and we
>> shouldn't block it.
>>
>> OTOH I see where Romain is coming from and I have a good example that
>> supports a slightly different action. Consider
>> https://github.com/apache/beam/pull/4740 which fixes some errors in how 
>> we
>> use dependency mechanisms.
>>
>> This PR is green except that I need to fix some Maven pom slightly
>> more. That is throwaway work. I would love to just not have to do it. But
>> removing the precommit does not actually make the PR OK to merge. It 
>> would
>> cause postcommits to fail.
>>
>> We can hope such situations are rare. I think I tend to be hit by this
>> more often than most, as I work with the project build health quite a 
>> bit.
>>
>> Here is a proposal to support these things: instead of deleting the
>> job in #4814, move it to not run automatically but only via a phrase. In
>> fact, you could migrate it to be the manually-invoked version of the
>> postcommit job as we've discussed a couple times. Then if someone is 
>> working
>> on the build in something like #4740 they can invoke it manually.
>>
>> Kenn
>>
>> On Fri, Mar 9, 2018 at 10:25 AM Lukasz Cwik  wrote:
>>>
>>> Based upon the criteria that was discussed on the mailing list[1], I
>>> would agree with Kenn about merging PR/4814 (drop Java Maven precommit).
>>>
>>> 1:
>>> https://lists.apache.org/thread.html/7eba5c77bc1a77b5046d915ab59f5f6fc41536c2c84863ad2efb5e99@%3Cdev.beam.apache.org%3E
>>>
>>> On Thu, Mar 8, 2018 at 9:10 PM, Romain Manni-Bucau
>>>  wrote:

 Hi Kenneth,

 For now maven covers the full needs of beam. If we start to have
 this kind of PR we become dependent of the 2 builds which is what this
 thread is about avoiding 

Re: Build breaks on examples on jenkins with dataflow runner

2018-03-22 Thread Etienne Chauchot
Also, WDYT about running these tests as PostCommit instead of preCommit as they 
are integration tests?
Etienne
Le jeudi 22 mars 2018 à 09:49 +0100, Etienne Chauchot a écrit :
> Hi all,
> java PreCommit test fails on jenkins on the examples module (woundCountIT). 
> It gives incorrect signal on the build of
> PRs.
> It seems to be related to communication issues with dataflow service
> 
> org.apache.beam.examples.WindowedWordCountIT.testWindowedWordCountInBatchStaticSharding
> or 
> org.apache.beam.examples.WindowedWordCountIT.testWindowedWordCountInBatchDynamicSharding
> A work item was attempted 4 times without success. Each time the worker 
> eventually lost contact with the service. The
> work item was attempted on: 
>   testpipeline-jenkins-0321-03210922-9f05-harness-qxtj,
>   testpipeline-jenkins-0321-03210922-9f05-harness-98n1,
>   testpipeline-jenkins-0321-03210922-9f05-harness-47mf,
>   testpipeline-jenkins-0321-03210922-9f05-harness-n1vb
> 
> 
> org.apache.beam.examples.WordCountIT.testE2EWordCount
> java.lang.RuntimeException: Workflow failed. Causes: The Dataflow appears to 
> be stuck. You can get help with Cloud
> Dataflow at https://cloud.google.com/dataflow/support.
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:134)
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:90)
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:55)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>   at org.apache.beam.examples.WordCount.runWordCount(WordCount.java:185)
>   at 
> org.apache.beam.examples.WordCountIT.testE2EWordCount(WordCountIT.java:70)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at 
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:410)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>  
> Does anyone has a clue?
> 
> Etienne

Jenkins build is back to normal : beam_Release_NightlySnapshot #721

2018-03-22 Thread Apache Jenkins Server
See 




Re: [ANNOUNCE] Apache Beam 2.4.0 released

2018-03-22 Thread Romain Manni-Bucau
congrats guys


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-03-22 9:50 GMT+01:00 Etienne Chauchot :

> Great !
> Le jeudi 22 mars 2018 à 08:24 +, Robert Bradshaw a écrit :
>
> We are pleased to announce the release of Apache Beam 2.4.0. Thanks goes to
> the many people who made this possible.
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
>
>  https://beam.apache.org/get-started/downloads/
>
> As well as many bugfixes, some notable changes in this release are:
> - A new Python Direct runner, up to 15x faster than the old one.
> - Kinesis support for reading and writing in Java
> - Several refactoring to enable portability (Go/Python on Flink/Spark)
>
> Full release notes can be found at
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342682=12319527
>
> Enjoy!
>
>


Build breaks on examples on jenkins with dataflow runner

2018-03-22 Thread Etienne Chauchot
Hi all,
java PreCommit test fails on jenkins on the examples module (woundCountIT). It 
gives incorrect signal on the build of
PRs.
It seems to be related to communication issues with dataflow service

org.apache.beam.examples.WindowedWordCountIT.testWindowedWordCountInBatchStaticSharding
 or org.apache.beam.examples.Wind
owedWordCountIT.testWindowedWordCountInBatchDynamicSharding
A work item was attempted 4 times without success. Each time
the worker eventually lost contact with the service. The work item was 
attempted on: 
  testpipeline-jenkins-0321-03210922-9f05-harness-qxtj,
  testpipeline-jenkins-0321-03210922-9f05-harness-98n1,
  testpipeline-jenkins-0321-03210922-9f05-harness-47mf,
  testpipeline-jenkins-0321-03210922-9f05-harness-n1vb


org.apache.beam.examples.WordCountIT.testE2EWordCount
java.lang.RuntimeException: Workflow failed. Causes: The Dataflow appears to be 
stuck. You can get help with Cloud Dataflow at 
https://cloud.google.com/dataflow/support.
at 
org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:134)
at 
org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:90)
at 
org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:55)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at org.apache.beam.examples.WordCount.runWordCount(WordCount.java:185)
at 
org.apache.beam.examples.WordCountIT.testE2EWordCount(WordCountIT.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at 
org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:410)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
 
Does anyone has a clue?

Etienne

Re: [ANNOUNCE] Apache Beam 2.4.0 released

2018-03-22 Thread Etienne Chauchot
Great !
Le jeudi 22 mars 2018 à 08:24 +, Robert Bradshaw a écrit :
> We are pleased to announce the release of Apache Beam 2.4.0. Thanks goes to
> the many people who made this possible.
> 
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
> 
> You can download the release here:
> 
>  https://beam.apache.org/get-started/downloads/
> 
> As well as many bugfixes, some notable changes in this release are:
> - A new Python Direct runner, up to 15x faster than the old one.
> - Kinesis support for reading and writing in Java
> - Several refactoring to enable portability (Go/Python on Flink/Spark)
> 
> Full release notes can be found at
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342682=12319527
> 
> Enjoy!

Re: Gradle status

2018-03-22 Thread Romain Manni-Bucau
hey guys,

2.4 is out, do we plan to drop all maven descriptors tomorrow or on monday?


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-03-09 21:42 GMT+01:00 Kenneth Knowles :

> On Fri, Mar 9, 2018 at 12:16 PM Lukasz Cwik  wrote:
>
>> Based upon your description it seems as though you would rather have a
>> way to run existing postcommits without it impacting dashboard/health
>> stats/notifications/ (We have just run the PostCommits on PRs for
>> additional validation (like upgrading the Dataflow container image)).
>>
>
> Yes, that is exactly what I have described.
>
> I don't think that keeping the current Java PreCommit as a proxy for the
>> the Java PostCommit is the right way to go but I also don't have the time
>> to implement what your actually asking for.
>>
>
> Mostly I thought this might be very easy based on the fact that they are
> nearly identical. If not, oh well.
>
> Kenn
>
>
> It seems more likely that migrating the PostCommit to Gradle will be less
>> work then adding the functionality but your argument where the PreCommit is
>> a proxy for the Java PostCommit also applies to the ValidatesRunner
>> PostCommits and so forth requiring even more migration to happen before you
>> don't have to worry about maintaining Maven/breaking post commits.
>>
>> I'm fine with leaving both the Java/Gradle PreCommits running for now and
>> hopefully as more of the PostCommits are migrated off we will be able to
>> remove it.
>>
>> On Fri, Mar 9, 2018 at 11:39 AM, Kenneth Knowles  wrote:
>>
>>> Separate history (for easy dashboarding, health stats, etc) and
>>> notification (email to dev@ for postcommits, nothing for precommits)
>>> for pre & post commit targets.
>>>
>>> A post commit failure is always a problem to be triaged at high
>>> priority, while a precommit failure is just a natural occurrence.
>>>
>>> On Fri, Mar 9, 2018 at 11:33 AM Lukasz Cwik  wrote:
>>>
 Ken, I'm probably not seeing something but how does using the PreCommit
 as a proxy improve upon just running the post commit via the phrase it
 already supports ('Run Java PostCommit')?

 On Fri, Mar 9, 2018 at 11:22 AM, Kenneth Knowles 
 wrote:

> Indeed, we've already had the discussion a couple of times and I think
> the criteria are clearly met. Incremental progress is a good thing and we
> shouldn't block it.
>
> OTOH I see where Romain is coming from and I have a good example that
> supports a slightly different action. Consider
> https://github.com/apache/beam/pull/4740 which fixes some errors in
> how we use dependency mechanisms.
>
> This PR is green except that I need to fix some Maven pom slightly
> more. That is throwaway work. I would love to just not have to do it. But
> removing the precommit does not actually make the PR OK to merge. It would
> cause postcommits to fail.
>
> We can hope such situations are rare. I think I tend to be hit by this
> more often than most, as I work with the project build health quite a bit.
>
> Here is a proposal to support these things: instead of deleting the
> job in #4814, move it to not run automatically but only via a phrase. In
> fact, you could migrate it to be the manually-invoked version of the
> postcommit job as we've discussed a couple times. Then if someone is
> working on the build in something like #4740 they can invoke it manually.
>
> Kenn
>
> On Fri, Mar 9, 2018 at 10:25 AM Lukasz Cwik  wrote:
>
>> Based upon the criteria that was discussed on the mailing list[1], I
>> would agree with Kenn about merging PR/4814 (drop Java Maven precommit).
>>
>> 1: https://lists.apache.org/thread.html/
>> 7eba5c77bc1a77b5046d915ab59f5f6fc41536c2c84863ad2efb5e99@%
>> 3Cdev.beam.apache.org%3E
>>
>> On Thu, Mar 8, 2018 at 9:10 PM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Hi Kenneth,
>>>
>>> For now maven covers the full needs of beam. If we start to have
>>> this kind of PR we become dependent of the 2 builds which is what this
>>> thread is about avoiding so tempted to say it must be a PR drop 
>>> completely
>>> maven or nothing as mentionned before.
>>>
>>> Le 9 mars 2018 04:48, "Kenneth Knowles"  a écrit :
>>>
 I would like to briefly re-focus this discussion and suggest that
 we merge https://github.com/apache/beam/pull/4814.

 The only material objection I've heard is that it means the
 precommit no longer 

[ANNOUNCE] Apache Beam 2.4.0 released

2018-03-22 Thread Robert Bradshaw
We are pleased to announce the release of Apache Beam 2.4.0. Thanks goes to
the many people who made this possible.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

 https://beam.apache.org/get-started/downloads/

As well as many bugfixes, some notable changes in this release are:
- A new Python Direct runner, up to 15x faster than the old one.
- Kinesis support for reading and writing in Java
- Several refactoring to enable portability (Go/Python on Flink/Spark)

Full release notes can be found at

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342682=12319527

Enjoy!