Re: Status of our CI tools

2017-04-28 Thread Mingmin Xu
+1
Have ignored TravisCI for some time as the failures are not related with 
code/test issues.

I still hope TravisCI could work with Beam code repository some day, to run 
tests before creating a PR.

Mingmin

> On Apr 28, 2017, at 10:26 PM, Aljoscha Krettek  wrote:
> 
> Big +1
> 
>> On 29. Apr 2017, at 07:21, Robert Bradshaw  
>> wrote:
>> 
>> On Fri, Apr 28, 2017 at 9:56 PM, Jean-Baptiste Onofré  
>> wrote:
>>> +1
>>> 
>>> Travis is useless and our Jenkins is good IMHO !
>> 
>> Travis is really useful for the Python SDK, but I'm hopeful that soon
>> Jenkins will be stable and quick enough that I won't miss it, and
>> having only one CI to deal with should simplify things.
>> 
>> - Robert
> 


Re: Status of our CI tools

2017-04-28 Thread Aljoscha Krettek
Big +1

> On 29. Apr 2017, at 07:21, Robert Bradshaw  
> wrote:
> 
> On Fri, Apr 28, 2017 at 9:56 PM, Jean-Baptiste Onofré  
> wrote:
>> +1
>> 
>> Travis is useless and our Jenkins is good IMHO !
> 
> Travis is really useful for the Python SDK, but I'm hopeful that soon
> Jenkins will be stable and quick enough that I won't miss it, and
> having only one CI to deal with should simplify things.
> 
> - Robert



Re: Status of our CI tools

2017-04-28 Thread Robert Bradshaw
On Fri, Apr 28, 2017 at 9:56 PM, Jean-Baptiste Onofré  wrote:
> +1
>
> Travis is useless and our Jenkins is good IMHO !

Travis is really useful for the Python SDK, but I'm hopeful that soon
Jenkins will be stable and quick enough that I won't miss it, and
having only one CI to deal with should simplify things.

- Robert


Re: Status of our CI tools

2017-04-28 Thread Jean-Baptiste Onofré

+1

Travis is useless and our Jenkins is good IMHO !

Thanks.
Regards
JB

On 04/29/2017 03:22 AM, Davor Bonaci wrote:

Early on in the project, we've discussed our CI needs and concluded to use
ASF-hosted Jenkins as our preferred tool of choice. We've also enabled
Travis-CI, which covered some scenarios that Jenkins couldn't do at the
time, but with the idea to transition to Jenkins eventually.

Over the last few months, Travis-CI has been broken consistently, and
several different kinds of infrastructure breakages have been added, one on
top of another. This has caused plenty of cost and confusion. In
particular, contributors often get confused as to which signal they should
care about.

At the same time, Jenkins capabilities have improved greatly: multiple
parallel precommits are now supported, checked-in DSL support, pipelined
matrix builds, Google's donation of Jenkins executors more than doubled,
and others.

So, based on the previous consensus and the fact the signal was broken for
a long time, Jason and I went and asked Infra to disable Travis-CI on our
code repository. (Website repository was disabled months ago.)

I believe there should be minimal impact of this. The only two elements of
the Travis matrix that were passing (still) are Python SDK on the Linux &
Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is
looking at that. Mac coverage is the only loss at the moment, but is
something we can likely address in the (near) future.

I'm excited that we finally managed to unify our CI tooling, and can make
efforts on improving and maintaining one system as opposed to two. That
said, please comment if you have any worries about this or ideas for
further CI improvements ;-)

Davor



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Status of our CI tools

2017-04-28 Thread Thomas Weise
+1 for letting Jenkins rule alone :)


On Fri, Apr 28, 2017 at 6:44 PM, Ted Yu  wrote:

> +1
>
> On Fri, Apr 28, 2017 at 6:24 PM, Thomas Groh 
> wrote:
>
> > +1! This will be really helpful when looking at my PRs; I basically get
> no
> > signal from the current state of the github UI, and this will restore
> that
> > to giving me very strong positive signal.
> >
> > On Fri, Apr 28, 2017 at 6:22 PM, Davor Bonaci  wrote:
> >
> > > Early on in the project, we've discussed our CI needs and concluded to
> > use
> > > ASF-hosted Jenkins as our preferred tool of choice. We've also enabled
> > > Travis-CI, which covered some scenarios that Jenkins couldn't do at the
> > > time, but with the idea to transition to Jenkins eventually.
> > >
> > > Over the last few months, Travis-CI has been broken consistently, and
> > > several different kinds of infrastructure breakages have been added,
> one
> > on
> > > top of another. This has caused plenty of cost and confusion. In
> > > particular, contributors often get confused as to which signal they
> > should
> > > care about.
> > >
> > > At the same time, Jenkins capabilities have improved greatly: multiple
> > > parallel precommits are now supported, checked-in DSL support,
> pipelined
> > > matrix builds, Google's donation of Jenkins executors more than
> doubled,
> > > and others.
> > >
> > > So, based on the previous consensus and the fact the signal was broken
> > for
> > > a long time, Jason and I went and asked Infra to disable Travis-CI on
> our
> > > code repository. (Website repository was disabled months ago.)
> > >
> > > I believe there should be minimal impact of this. The only two elements
> > of
> > > the Travis matrix that were passing (still) are Python SDK on the
> Linux &
> > > Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is
> > > looking at that. Mac coverage is the only loss at the moment, but is
> > > something we can likely address in the (near) future.
> > >
> > > I'm excited that we finally managed to unify our CI tooling, and can
> make
> > > efforts on improving and maintaining one system as opposed to two. That
> > > said, please comment if you have any worries about this or ideas for
> > > further CI improvements ;-)
> > >
> > > Davor
> > >
> >
>


Re: Status of our CI tools

2017-04-28 Thread Ted Yu
+1

On Fri, Apr 28, 2017 at 6:24 PM, Thomas Groh 
wrote:

> +1! This will be really helpful when looking at my PRs; I basically get no
> signal from the current state of the github UI, and this will restore that
> to giving me very strong positive signal.
>
> On Fri, Apr 28, 2017 at 6:22 PM, Davor Bonaci  wrote:
>
> > Early on in the project, we've discussed our CI needs and concluded to
> use
> > ASF-hosted Jenkins as our preferred tool of choice. We've also enabled
> > Travis-CI, which covered some scenarios that Jenkins couldn't do at the
> > time, but with the idea to transition to Jenkins eventually.
> >
> > Over the last few months, Travis-CI has been broken consistently, and
> > several different kinds of infrastructure breakages have been added, one
> on
> > top of another. This has caused plenty of cost and confusion. In
> > particular, contributors often get confused as to which signal they
> should
> > care about.
> >
> > At the same time, Jenkins capabilities have improved greatly: multiple
> > parallel precommits are now supported, checked-in DSL support, pipelined
> > matrix builds, Google's donation of Jenkins executors more than doubled,
> > and others.
> >
> > So, based on the previous consensus and the fact the signal was broken
> for
> > a long time, Jason and I went and asked Infra to disable Travis-CI on our
> > code repository. (Website repository was disabled months ago.)
> >
> > I believe there should be minimal impact of this. The only two elements
> of
> > the Travis matrix that were passing (still) are Python SDK on the Linux &
> > Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is
> > looking at that. Mac coverage is the only loss at the moment, but is
> > something we can likely address in the (near) future.
> >
> > I'm excited that we finally managed to unify our CI tooling, and can make
> > efforts on improving and maintaining one system as opposed to two. That
> > said, please comment if you have any worries about this or ideas for
> > further CI improvements ;-)
> >
> > Davor
> >
>


Re: Status of our CI tools

2017-04-28 Thread Thomas Groh
+1! This will be really helpful when looking at my PRs; I basically get no
signal from the current state of the github UI, and this will restore that
to giving me very strong positive signal.

On Fri, Apr 28, 2017 at 6:22 PM, Davor Bonaci  wrote:

> Early on in the project, we've discussed our CI needs and concluded to use
> ASF-hosted Jenkins as our preferred tool of choice. We've also enabled
> Travis-CI, which covered some scenarios that Jenkins couldn't do at the
> time, but with the idea to transition to Jenkins eventually.
>
> Over the last few months, Travis-CI has been broken consistently, and
> several different kinds of infrastructure breakages have been added, one on
> top of another. This has caused plenty of cost and confusion. In
> particular, contributors often get confused as to which signal they should
> care about.
>
> At the same time, Jenkins capabilities have improved greatly: multiple
> parallel precommits are now supported, checked-in DSL support, pipelined
> matrix builds, Google's donation of Jenkins executors more than doubled,
> and others.
>
> So, based on the previous consensus and the fact the signal was broken for
> a long time, Jason and I went and asked Infra to disable Travis-CI on our
> code repository. (Website repository was disabled months ago.)
>
> I believe there should be minimal impact of this. The only two elements of
> the Travis matrix that were passing (still) are Python SDK on the Linux &
> Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is
> looking at that. Mac coverage is the only loss at the moment, but is
> something we can likely address in the (near) future.
>
> I'm excited that we finally managed to unify our CI tooling, and can make
> efforts on improving and maintaining one system as opposed to two. That
> said, please comment if you have any worries about this or ideas for
> further CI improvements ;-)
>
> Davor
>


Status of our CI tools

2017-04-28 Thread Davor Bonaci
Early on in the project, we've discussed our CI needs and concluded to use
ASF-hosted Jenkins as our preferred tool of choice. We've also enabled
Travis-CI, which covered some scenarios that Jenkins couldn't do at the
time, but with the idea to transition to Jenkins eventually.

Over the last few months, Travis-CI has been broken consistently, and
several different kinds of infrastructure breakages have been added, one on
top of another. This has caused plenty of cost and confusion. In
particular, contributors often get confused as to which signal they should
care about.

At the same time, Jenkins capabilities have improved greatly: multiple
parallel precommits are now supported, checked-in DSL support, pipelined
matrix builds, Google's donation of Jenkins executors more than doubled,
and others.

So, based on the previous consensus and the fact the signal was broken for
a long time, Jason and I went and asked Infra to disable Travis-CI on our
code repository. (Website repository was disabled months ago.)

I believe there should be minimal impact of this. The only two elements of
the Travis matrix that were passing (still) are Python SDK on the Linux &
Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is
looking at that. Mac coverage is the only loss at the moment, but is
something we can likely address in the (near) future.

I'm excited that we finally managed to unify our CI tooling, and can make
efforts on improving and maintaining one system as opposed to two. That
said, please comment if you have any worries about this or ideas for
further CI improvements ;-)

Davor


Re: Community hackathon

2017-04-28 Thread Davor Bonaci
Thanks everyone!

A quick summary:
* I've counted nearly 20 participants; there were possibly more.
* 11+ issues filed.
* 10+ bugs fixed.
* Some severe issues discovered, but most things did work as expected.
* Grew to 146 participants on the Slack channel, nearly 20 new participants
20 in the past several days.

I learned quite a few new tricks, and I think we made a dent in our overall
stability -- thanks everyone!

Davor

On Wed, Apr 26, 2017 at 10:57 PM, Davor Bonaci  wrote:

> Let's start!
>
> A few quick notes on how to get started: https://docs.google.c
> om/document/d/1UKC2R_9FkSdMVTz2nt2sIW18KoLbIu6w0aj9bwSSPiw/edit
>
> There's no particular progress on the Jenkins infrastructure issue.
> Current workaround: ask on Slack, and a committer can manually kick-off a
> build and give you a link to the Jenkins job. Inspect the job to see the
> results.
>
> It's getting late on the US West Coast, but I plan to be available for the
> next few hours. Then, I'll disappear for a few hours and then show up in
> the morning.
>
> Hope to see many of you on Slack, and let's make this a success!
>
> Davor
>
> On Wed, Apr 26, 2017 at 2:02 PM, Davor Bonaci  wrote:
>
>> The outage is still ongoing, unfortunately... I think we should make one
>> more (and final) delay.
>>
>> New (final) start time: 11 PM Pacific, and we start regardless of the
>> outage.
>>
>> On Wed, Apr 26, 2017 at 10:04 AM, Davor Bonaci  wrote:
>>
>>> *** DELAYED START ***
>>>
>>> Unfortunately, due to Jenkins infrastructure issue currently ongoing,
>>> we'll have to delay the start of the hackthon.
>>>
>>> New (tentative) start time: 2 PM Pacific (4 hours from now).
>>>
>>> I apologize for the delay.
>>>
>>> Davor
>>>
>>> On Wed, Apr 26, 2017 at 2:03 AM, Tibor Kiss 
>>> wrote:
>>>
 I've created a meetup in Budapest for this event:
 https://www.meetup.com/futureofdata-budapest/events/239504356/

 We (folks from the Hortonworks office @ Budapest) will try to prep a
 demo
 for the event and if time allows we'll jump into open issues.

 On Tue, Apr 25, 2017 at 7:54 AM, Davor Bonaci  wrote:

 > Thanks everyone for the enthusiasm!
 >
 > Let's go with this Wednesday, 4/26, starting at 10 AM Pacific time,
 and
 > running for the following 24 hours. I'll try to seed the
 > instructions/starting point, and then let's take it from there.
 >
 > (Michael, invite sent.)
 >
 > Davor
 >
 > On Mon, Apr 24, 2017 at 7:47 PM, Michael Huston >>> >
 > wrote:
 >
 > > Could you please add me to the Slack channel also? My apologizes
 for the
 > > noise on this mailing list and if there is a better way to request
 > access.
 > >
 > > Cheers,
 > > Michael
 > >
 > > On Mon, Apr 24, 2017 at 6:15 PM, Lukasz Cwik
 
 > > wrote:
 > >
 > > > Dylan, sent you invite to slack channel.
 > > >
 > > > On Mon, Apr 24, 2017 at 5:18 PM, Dylan Raithel <
 dylanrait...@gmail.com
 > >
 > > > wrote:
 > > >
 > > > > Can you please add me to the Slack channel?
 > > > >
 > > > > On Apr 24, 2017 12:51 AM, "Jean-Baptiste Onofré" <
 j...@nanthrax.net>
 > > > wrote:
 > > > >
 > > > > > That's a wonderful idea !
 > > > > >
 > > > > > I think the easiest way to organize this event is using the
 Slack
 > > > > channels
 > > > > > to discuss, help each other, and sync together.
 > > > > >
 > > > > > Regards
 > > > > > JB
 > > > > >
 > > > > > On 04/24/2017 09:48 AM, Davor Bonaci wrote:
 > > > > >
 > > > > >> We've been working as a community towards the first stable
 release
 > > > for a
 > > > > >> while now, and I think we made a ton of progress across the
 board
 > > over
 > > > > the
 > > > > >> last few weeks.
 > > > > >>
 > > > > >> We could try to organize a community-wide hackathon to
 identify
 > and
 > > > fix
 > > > > >> those last few issues, as well as to get a better sense of
 the
 > > overall
 > > > > >> project quality as it stands right now.
 > > > > >>
 > > > > >> This could be a self-organized event, and coordinated via the
 > Slack
 > > > > >> channel. For example, we (as a community and participants)
 can try
 > > out
 > > > > the
 > > > > >> project in various ways -- quickstart, examples, different
 > runners,
 > > > > >> different platforms -- immediately fixing issues as we run
 into
 > > them.
 > > > It
 > > > > >> could last, say, 24 hours, with people from different time
 zones
 > > > > >> participating at the time of their choosing.
 > > > > >>
 > > > > >> Thoughts?
 > > > > >>
 > > > > >> Davor
 > > > > >>
 > > > > >>
 > > > > > --
 > > > > > Jean-Baptiste Onofré
 > > > > > jbono...@apache.org
 > > > > > http://blog.nanthrax.net
 > > > > > Talend - http://www.talend.c

Re: [DISCUSSION] Encouraging more contributions

2017-04-28 Thread Sourabh Bajaj
I think they can probably reach out to the mentor for questions like: How
to navigate the code base? What parts of the code could they use as a
pattern? This could be done using the preferred mode of communication based
on the contributor.

My opinion is that large projects and communities may come across as
intimidating to first time contributors, so being as welcoming and
encouraging is important.

On Thu, Apr 27, 2017 at 8:52 PM Aviem Zur  wrote:

> @
> Sourabh Bajaj
>
> The mentoring on starter tickets is an interesting Idea. How would it
> technically work?.
>
> A new contributor assigns a starter ticket to themselves. What happens from
> there?
>
> On Tue, Apr 25, 2017 at 12:01 PM Ismaël Mejía  wrote:
>
> > I think it is important to clarify that the developer documentation
> > discussed in this thread is of two kinds:
> >
> > 6.1. Documents with proposals and new designs, those covered by the
> > Beam Improvement Proposal (BEAM-566), and that we need to put with a
> > single file index (I remember there was a google dir for this but not
> > sure it is still valid, and in any case probably the website is a
> > better place for this). Is there any progress on this?
> >
> > 6.2. Documentation about how things work, so new developers can get
> > into developing features/fixes for the project, those are the kind
> > that Kenneth/Etienne mention and include Stephen’s IO guide but could
> > be definitely expanded to include things like how does the different
> > runner translation works, or some details on triggers/materialization
> > of panes/windows from the SDK point of view. However the hard part of
> > this documents is that they should be maintained e.g. updated when the
> > code evolves so they don’t get outdated as JB mentions.
> >
> > On Tue, Apr 25, 2017 at 10:47 AM, Wesley Tanaka
> >  wrote:
> > > These are the ones I've come across so far, are there others?
> > >
> > > * Dynamic DoFn https://s.apache.org/a-new-dofn
> > >
> > > ** Splittable DoFn (Obsoletes Source API)
> > http://s.apache.org/splittable-do-fn
> > >
> > > ** State and Timers for DoFn: https://s.apache.org/beam-state
> > >
> > >
> > > * Lateness https://s.apache.org/beam-lateness
> > >
> > >
> > > * Metrics API http://s.apache.org/beam-metrics-api
> > >
> > > ** I/O Metrics https://s.apache.org/standard-io-metrics
> > >
> > >
> > > * Runner API http://s.apache.org/beam-runner-api
> > >
> > > ** https://s.apache.org/beam-runner-composites
> > >
> > > ** https://s.apache.org/beam-side-inputs-1-pager
> > >
> > >
> > > * Fn API http://s.apache.org/beam-fn-api
> > >
> > > ---
> > > Wesley Tanaka
> > > https://wtanaka.com/
> > >
> > >
> > > On Monday, April 24, 2017, 2:45:45 PM HST, Sourabh Bajaj <
> > sourabhba...@google.com.INVALID> wrote:
> > > For 6. I think having them in one page on the website where we can find
> > the
> > > design docs more easily would be great.
> > >
> > > 7. For low-hanging-fruit, one thing I really liked from some Mozilla
> > > projects was assigning a mentor on the ticket. Someone you can reach
> out
> > to
> > > if you have questions. I think this makes the entry barrier really low
> > for
> > > first time contributors who might feel intimidated asking questions
> > > completely in public.
> > >
> > > On Mon, Apr 24, 2017 at 10:06 AM Kenneth Knowles
>  > >
> > > wrote:
> > >
> > >> I like the subject Etienne has brought up, and will give it a number
> in
> > >> this list :-)
> > >>
> > >> 6. Have more technical reference docs (not just workspace set up) for
> > >> contributors.
> > >>
> > >> I think this overlaps a lot with a prior discussion about where to
> > collect
> > >> design proposals [1]. Design docs used to be just dropped into a
> public
> > >> folder, but that got disorganized. And that thread was about work in
> > >> progress, so JIRA was a good place for details after a dev@ thread
> > agrees
> > >> on a proposal. At this point, the designs are pretty solid
> conceptually
> > or
> > >> even implemented and we could start to build out deeper technical bits
> > on
> > >> the web site, or at least some place that people can find it. We do
> have
> > >> the Testing Guide and the PTransform Style Guide and somewhere near
> > there
> > >> we could have deeper references. I think we need a broader vision for
> > the
> > >> "table of contents" here.
> > >>
> > >> For my docs (triggers, lateness, runner API, side inputs, state,
> > coders) I
> > >> haven't had time, but I do intend to both translate from GDoc to some
> > other
> > >> format and also rewrite versions for users where appropriate. Probably
> > this
> > >> will mean coming up with that table of contents.
> > >>
> > >> Kenn
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://lists.apache.org/thread.html/%3c6bc60c88-cf91-4fff-eae6-fea6ee06f...@nanthrax.net%3E
> > >>
> > >>
> > >> On Mon, Apr 24, 2017 at 9:33 AM, Neelesh Salian <
> > neeleshssal...@gmail.com>
> > >> wrote:
> > >>
> > >> > Agreed. I have some old JIRAs that I am cleaning up.
> > >>

Re: How to control watermark when using BoundedSource

2017-04-28 Thread Shen Li
Hi Thomas,

Thanks for the explanation. Does it mean I cannot reproduce the real-time
behavior of the replayed trace?  Say the watermarks are perfect and
FixedWindows groups elements into 1-minute windows, will the watermarks
trigger the FixedWindows to fire roughly every minute?

I am a little confused about the "when available" behavior of the runner.
Since the watermarks emitted by the BoundedSource will always be
BoundedWindow.TIMESTAMP_MIN_VALUE except for the last watermark, how could
the runner know when to trigger the computation on a window?

Thanks,

Shen

On Fri, Apr 28, 2017 at 1:13 PM, Thomas Groh 
wrote:

> You can't directly control the watermark that a BoundedSource emits.
> Windowing into FixedWindows will still work as you expect, however: your
> elements will be assigned to their windows based on the time the event
> occurred. Depending on the runner, triggers may be run either "when
> available" or after all the work is completed, but your output data will be
> as if you had a perfect watermark.
>
> On Fri, Apr 28, 2017 at 10:09 AM, Shen Li  wrote:
>
> > Hi,
> >
> > Say I want to replay a data trace of last week using fixed windows. The
> > data trace is read from a file using TextIO. In order to trigger windows
> at
> > right times, how can I control the watermark emitted by the
> BoundedSource?
> >
> > Thanks,
> >
> > Shen
> >
>


Re: How to control watermark when using BoundedSource

2017-04-28 Thread Thomas Groh
You can't directly control the watermark that a BoundedSource emits.
Windowing into FixedWindows will still work as you expect, however: your
elements will be assigned to their windows based on the time the event
occurred. Depending on the runner, triggers may be run either "when
available" or after all the work is completed, but your output data will be
as if you had a perfect watermark.

On Fri, Apr 28, 2017 at 10:09 AM, Shen Li  wrote:

> Hi,
>
> Say I want to replay a data trace of last week using fixed windows. The
> data trace is read from a file using TextIO. In order to trigger windows at
> right times, how can I control the watermark emitted by the BoundedSource?
>
> Thanks,
>
> Shen
>


How to control watermark when using BoundedSource

2017-04-28 Thread Shen Li
Hi,

Say I want to replay a data trace of last week using fixed windows. The
data trace is read from a file using TextIO. In order to trigger windows at
right times, how can I control the watermark emitted by the BoundedSource?

Thanks,

Shen