Re: Status of our CI tools
+1 Have ignored TravisCI for some time as the failures are not related with code/test issues. I still hope TravisCI could work with Beam code repository some day, to run tests before creating a PR. Mingmin > On Apr 28, 2017, at 10:26 PM, Aljoscha Krettek wrote: > > Big +1 > >> On 29. Apr 2017, at 07:21, Robert Bradshaw >> wrote: >> >> On Fri, Apr 28, 2017 at 9:56 PM, Jean-Baptiste Onofré >> wrote: >>> +1 >>> >>> Travis is useless and our Jenkins is good IMHO ! >> >> Travis is really useful for the Python SDK, but I'm hopeful that soon >> Jenkins will be stable and quick enough that I won't miss it, and >> having only one CI to deal with should simplify things. >> >> - Robert >
Re: Status of our CI tools
Big +1 > On 29. Apr 2017, at 07:21, Robert Bradshaw > wrote: > > On Fri, Apr 28, 2017 at 9:56 PM, Jean-Baptiste Onofré > wrote: >> +1 >> >> Travis is useless and our Jenkins is good IMHO ! > > Travis is really useful for the Python SDK, but I'm hopeful that soon > Jenkins will be stable and quick enough that I won't miss it, and > having only one CI to deal with should simplify things. > > - Robert
Re: Status of our CI tools
On Fri, Apr 28, 2017 at 9:56 PM, Jean-Baptiste Onofré wrote: > +1 > > Travis is useless and our Jenkins is good IMHO ! Travis is really useful for the Python SDK, but I'm hopeful that soon Jenkins will be stable and quick enough that I won't miss it, and having only one CI to deal with should simplify things. - Robert
Re: Status of our CI tools
+1 Travis is useless and our Jenkins is good IMHO ! Thanks. Regards JB On 04/29/2017 03:22 AM, Davor Bonaci wrote: Early on in the project, we've discussed our CI needs and concluded to use ASF-hosted Jenkins as our preferred tool of choice. We've also enabled Travis-CI, which covered some scenarios that Jenkins couldn't do at the time, but with the idea to transition to Jenkins eventually. Over the last few months, Travis-CI has been broken consistently, and several different kinds of infrastructure breakages have been added, one on top of another. This has caused plenty of cost and confusion. In particular, contributors often get confused as to which signal they should care about. At the same time, Jenkins capabilities have improved greatly: multiple parallel precommits are now supported, checked-in DSL support, pipelined matrix builds, Google's donation of Jenkins executors more than doubled, and others. So, based on the previous consensus and the fact the signal was broken for a long time, Jason and I went and asked Infra to disable Travis-CI on our code repository. (Website repository was disabled months ago.) I believe there should be minimal impact of this. The only two elements of the Travis matrix that were passing (still) are Python SDK on the Linux & Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is looking at that. Mac coverage is the only loss at the moment, but is something we can likely address in the (near) future. I'm excited that we finally managed to unify our CI tooling, and can make efforts on improving and maintaining one system as opposed to two. That said, please comment if you have any worries about this or ideas for further CI improvements ;-) Davor -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: Status of our CI tools
+1 for letting Jenkins rule alone :) On Fri, Apr 28, 2017 at 6:44 PM, Ted Yu wrote: > +1 > > On Fri, Apr 28, 2017 at 6:24 PM, Thomas Groh > wrote: > > > +1! This will be really helpful when looking at my PRs; I basically get > no > > signal from the current state of the github UI, and this will restore > that > > to giving me very strong positive signal. > > > > On Fri, Apr 28, 2017 at 6:22 PM, Davor Bonaci wrote: > > > > > Early on in the project, we've discussed our CI needs and concluded to > > use > > > ASF-hosted Jenkins as our preferred tool of choice. We've also enabled > > > Travis-CI, which covered some scenarios that Jenkins couldn't do at the > > > time, but with the idea to transition to Jenkins eventually. > > > > > > Over the last few months, Travis-CI has been broken consistently, and > > > several different kinds of infrastructure breakages have been added, > one > > on > > > top of another. This has caused plenty of cost and confusion. In > > > particular, contributors often get confused as to which signal they > > should > > > care about. > > > > > > At the same time, Jenkins capabilities have improved greatly: multiple > > > parallel precommits are now supported, checked-in DSL support, > pipelined > > > matrix builds, Google's donation of Jenkins executors more than > doubled, > > > and others. > > > > > > So, based on the previous consensus and the fact the signal was broken > > for > > > a long time, Jason and I went and asked Infra to disable Travis-CI on > our > > > code repository. (Website repository was disabled months ago.) > > > > > > I believe there should be minimal impact of this. The only two elements > > of > > > the Travis matrix that were passing (still) are Python SDK on the > Linux & > > > Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is > > > looking at that. Mac coverage is the only loss at the moment, but is > > > something we can likely address in the (near) future. > > > > > > I'm excited that we finally managed to unify our CI tooling, and can > make > > > efforts on improving and maintaining one system as opposed to two. That > > > said, please comment if you have any worries about this or ideas for > > > further CI improvements ;-) > > > > > > Davor > > > > > >
Re: Status of our CI tools
+1 On Fri, Apr 28, 2017 at 6:24 PM, Thomas Groh wrote: > +1! This will be really helpful when looking at my PRs; I basically get no > signal from the current state of the github UI, and this will restore that > to giving me very strong positive signal. > > On Fri, Apr 28, 2017 at 6:22 PM, Davor Bonaci wrote: > > > Early on in the project, we've discussed our CI needs and concluded to > use > > ASF-hosted Jenkins as our preferred tool of choice. We've also enabled > > Travis-CI, which covered some scenarios that Jenkins couldn't do at the > > time, but with the idea to transition to Jenkins eventually. > > > > Over the last few months, Travis-CI has been broken consistently, and > > several different kinds of infrastructure breakages have been added, one > on > > top of another. This has caused plenty of cost and confusion. In > > particular, contributors often get confused as to which signal they > should > > care about. > > > > At the same time, Jenkins capabilities have improved greatly: multiple > > parallel precommits are now supported, checked-in DSL support, pipelined > > matrix builds, Google's donation of Jenkins executors more than doubled, > > and others. > > > > So, based on the previous consensus and the fact the signal was broken > for > > a long time, Jason and I went and asked Infra to disable Travis-CI on our > > code repository. (Website repository was disabled months ago.) > > > > I believe there should be minimal impact of this. The only two elements > of > > the Travis matrix that were passing (still) are Python SDK on the Linux & > > Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is > > looking at that. Mac coverage is the only loss at the moment, but is > > something we can likely address in the (near) future. > > > > I'm excited that we finally managed to unify our CI tooling, and can make > > efforts on improving and maintaining one system as opposed to two. That > > said, please comment if you have any worries about this or ideas for > > further CI improvements ;-) > > > > Davor > > >
Re: Status of our CI tools
+1! This will be really helpful when looking at my PRs; I basically get no signal from the current state of the github UI, and this will restore that to giving me very strong positive signal. On Fri, Apr 28, 2017 at 6:22 PM, Davor Bonaci wrote: > Early on in the project, we've discussed our CI needs and concluded to use > ASF-hosted Jenkins as our preferred tool of choice. We've also enabled > Travis-CI, which covered some scenarios that Jenkins couldn't do at the > time, but with the idea to transition to Jenkins eventually. > > Over the last few months, Travis-CI has been broken consistently, and > several different kinds of infrastructure breakages have been added, one on > top of another. This has caused plenty of cost and confusion. In > particular, contributors often get confused as to which signal they should > care about. > > At the same time, Jenkins capabilities have improved greatly: multiple > parallel precommits are now supported, checked-in DSL support, pipelined > matrix builds, Google's donation of Jenkins executors more than doubled, > and others. > > So, based on the previous consensus and the fact the signal was broken for > a long time, Jason and I went and asked Infra to disable Travis-CI on our > code repository. (Website repository was disabled months ago.) > > I believe there should be minimal impact of this. The only two elements of > the Travis matrix that were passing (still) are Python SDK on the Linux & > Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is > looking at that. Mac coverage is the only loss at the moment, but is > something we can likely address in the (near) future. > > I'm excited that we finally managed to unify our CI tooling, and can make > efforts on improving and maintaining one system as opposed to two. That > said, please comment if you have any worries about this or ideas for > further CI improvements ;-) > > Davor >
Status of our CI tools
Early on in the project, we've discussed our CI needs and concluded to use ASF-hosted Jenkins as our preferred tool of choice. We've also enabled Travis-CI, which covered some scenarios that Jenkins couldn't do at the time, but with the idea to transition to Jenkins eventually. Over the last few months, Travis-CI has been broken consistently, and several different kinds of infrastructure breakages have been added, one on top of another. This has caused plenty of cost and confusion. In particular, contributors often get confused as to which signal they should care about. At the same time, Jenkins capabilities have improved greatly: multiple parallel precommits are now supported, checked-in DSL support, pipelined matrix builds, Google's donation of Jenkins executors more than doubled, and others. So, based on the previous consensus and the fact the signal was broken for a long time, Jason and I went and asked Infra to disable Travis-CI on our code repository. (Website repository was disabled months ago.) I believe there should be minimal impact of this. The only two elements of the Travis matrix that were passing (still) are Python SDK on the Linux & Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is looking at that. Mac coverage is the only loss at the moment, but is something we can likely address in the (near) future. I'm excited that we finally managed to unify our CI tooling, and can make efforts on improving and maintaining one system as opposed to two. That said, please comment if you have any worries about this or ideas for further CI improvements ;-) Davor
Re: Community hackathon
Thanks everyone! A quick summary: * I've counted nearly 20 participants; there were possibly more. * 11+ issues filed. * 10+ bugs fixed. * Some severe issues discovered, but most things did work as expected. * Grew to 146 participants on the Slack channel, nearly 20 new participants 20 in the past several days. I learned quite a few new tricks, and I think we made a dent in our overall stability -- thanks everyone! Davor On Wed, Apr 26, 2017 at 10:57 PM, Davor Bonaci wrote: > Let's start! > > A few quick notes on how to get started: https://docs.google.c > om/document/d/1UKC2R_9FkSdMVTz2nt2sIW18KoLbIu6w0aj9bwSSPiw/edit > > There's no particular progress on the Jenkins infrastructure issue. > Current workaround: ask on Slack, and a committer can manually kick-off a > build and give you a link to the Jenkins job. Inspect the job to see the > results. > > It's getting late on the US West Coast, but I plan to be available for the > next few hours. Then, I'll disappear for a few hours and then show up in > the morning. > > Hope to see many of you on Slack, and let's make this a success! > > Davor > > On Wed, Apr 26, 2017 at 2:02 PM, Davor Bonaci wrote: > >> The outage is still ongoing, unfortunately... I think we should make one >> more (and final) delay. >> >> New (final) start time: 11 PM Pacific, and we start regardless of the >> outage. >> >> On Wed, Apr 26, 2017 at 10:04 AM, Davor Bonaci wrote: >> >>> *** DELAYED START *** >>> >>> Unfortunately, due to Jenkins infrastructure issue currently ongoing, >>> we'll have to delay the start of the hackthon. >>> >>> New (tentative) start time: 2 PM Pacific (4 hours from now). >>> >>> I apologize for the delay. >>> >>> Davor >>> >>> On Wed, Apr 26, 2017 at 2:03 AM, Tibor Kiss >>> wrote: >>> I've created a meetup in Budapest for this event: https://www.meetup.com/futureofdata-budapest/events/239504356/ We (folks from the Hortonworks office @ Budapest) will try to prep a demo for the event and if time allows we'll jump into open issues. On Tue, Apr 25, 2017 at 7:54 AM, Davor Bonaci wrote: > Thanks everyone for the enthusiasm! > > Let's go with this Wednesday, 4/26, starting at 10 AM Pacific time, and > running for the following 24 hours. I'll try to seed the > instructions/starting point, and then let's take it from there. > > (Michael, invite sent.) > > Davor > > On Mon, Apr 24, 2017 at 7:47 PM, Michael Huston >>> > > wrote: > > > Could you please add me to the Slack channel also? My apologizes for the > > noise on this mailing list and if there is a better way to request > access. > > > > Cheers, > > Michael > > > > On Mon, Apr 24, 2017 at 6:15 PM, Lukasz Cwik > > wrote: > > > > > Dylan, sent you invite to slack channel. > > > > > > On Mon, Apr 24, 2017 at 5:18 PM, Dylan Raithel < dylanrait...@gmail.com > > > > > wrote: > > > > > > > Can you please add me to the Slack channel? > > > > > > > > On Apr 24, 2017 12:51 AM, "Jean-Baptiste Onofré" < j...@nanthrax.net> > > > wrote: > > > > > > > > > That's a wonderful idea ! > > > > > > > > > > I think the easiest way to organize this event is using the Slack > > > > channels > > > > > to discuss, help each other, and sync together. > > > > > > > > > > Regards > > > > > JB > > > > > > > > > > On 04/24/2017 09:48 AM, Davor Bonaci wrote: > > > > > > > > > >> We've been working as a community towards the first stable release > > > for a > > > > >> while now, and I think we made a ton of progress across the board > > over > > > > the > > > > >> last few weeks. > > > > >> > > > > >> We could try to organize a community-wide hackathon to identify > and > > > fix > > > > >> those last few issues, as well as to get a better sense of the > > overall > > > > >> project quality as it stands right now. > > > > >> > > > > >> This could be a self-organized event, and coordinated via the > Slack > > > > >> channel. For example, we (as a community and participants) can try > > out > > > > the > > > > >> project in various ways -- quickstart, examples, different > runners, > > > > >> different platforms -- immediately fixing issues as we run into > > them. > > > It > > > > >> could last, say, 24 hours, with people from different time zones > > > > >> participating at the time of their choosing. > > > > >> > > > > >> Thoughts? > > > > >> > > > > >> Davor > > > > >> > > > > >> > > > > > -- > > > > > Jean-Baptiste Onofré > > > > > jbono...@apache.org > > > > > http://blog.nanthrax.net > > > > > Talend - http://www.talend.c
Re: [DISCUSSION] Encouraging more contributions
I think they can probably reach out to the mentor for questions like: How to navigate the code base? What parts of the code could they use as a pattern? This could be done using the preferred mode of communication based on the contributor. My opinion is that large projects and communities may come across as intimidating to first time contributors, so being as welcoming and encouraging is important. On Thu, Apr 27, 2017 at 8:52 PM Aviem Zur wrote: > @ > Sourabh Bajaj > > The mentoring on starter tickets is an interesting Idea. How would it > technically work?. > > A new contributor assigns a starter ticket to themselves. What happens from > there? > > On Tue, Apr 25, 2017 at 12:01 PM Ismaël Mejía wrote: > > > I think it is important to clarify that the developer documentation > > discussed in this thread is of two kinds: > > > > 6.1. Documents with proposals and new designs, those covered by the > > Beam Improvement Proposal (BEAM-566), and that we need to put with a > > single file index (I remember there was a google dir for this but not > > sure it is still valid, and in any case probably the website is a > > better place for this). Is there any progress on this? > > > > 6.2. Documentation about how things work, so new developers can get > > into developing features/fixes for the project, those are the kind > > that Kenneth/Etienne mention and include Stephen’s IO guide but could > > be definitely expanded to include things like how does the different > > runner translation works, or some details on triggers/materialization > > of panes/windows from the SDK point of view. However the hard part of > > this documents is that they should be maintained e.g. updated when the > > code evolves so they don’t get outdated as JB mentions. > > > > On Tue, Apr 25, 2017 at 10:47 AM, Wesley Tanaka > > wrote: > > > These are the ones I've come across so far, are there others? > > > > > > * Dynamic DoFn https://s.apache.org/a-new-dofn > > > > > > ** Splittable DoFn (Obsoletes Source API) > > http://s.apache.org/splittable-do-fn > > > > > > ** State and Timers for DoFn: https://s.apache.org/beam-state > > > > > > > > > * Lateness https://s.apache.org/beam-lateness > > > > > > > > > * Metrics API http://s.apache.org/beam-metrics-api > > > > > > ** I/O Metrics https://s.apache.org/standard-io-metrics > > > > > > > > > * Runner API http://s.apache.org/beam-runner-api > > > > > > ** https://s.apache.org/beam-runner-composites > > > > > > ** https://s.apache.org/beam-side-inputs-1-pager > > > > > > > > > * Fn API http://s.apache.org/beam-fn-api > > > > > > --- > > > Wesley Tanaka > > > https://wtanaka.com/ > > > > > > > > > On Monday, April 24, 2017, 2:45:45 PM HST, Sourabh Bajaj < > > sourabhba...@google.com.INVALID> wrote: > > > For 6. I think having them in one page on the website where we can find > > the > > > design docs more easily would be great. > > > > > > 7. For low-hanging-fruit, one thing I really liked from some Mozilla > > > projects was assigning a mentor on the ticket. Someone you can reach > out > > to > > > if you have questions. I think this makes the entry barrier really low > > for > > > first time contributors who might feel intimidated asking questions > > > completely in public. > > > > > > On Mon, Apr 24, 2017 at 10:06 AM Kenneth Knowles > > > > > > wrote: > > > > > >> I like the subject Etienne has brought up, and will give it a number > in > > >> this list :-) > > >> > > >> 6. Have more technical reference docs (not just workspace set up) for > > >> contributors. > > >> > > >> I think this overlaps a lot with a prior discussion about where to > > collect > > >> design proposals [1]. Design docs used to be just dropped into a > public > > >> folder, but that got disorganized. And that thread was about work in > > >> progress, so JIRA was a good place for details after a dev@ thread > > agrees > > >> on a proposal. At this point, the designs are pretty solid > conceptually > > or > > >> even implemented and we could start to build out deeper technical bits > > on > > >> the web site, or at least some place that people can find it. We do > have > > >> the Testing Guide and the PTransform Style Guide and somewhere near > > there > > >> we could have deeper references. I think we need a broader vision for > > the > > >> "table of contents" here. > > >> > > >> For my docs (triggers, lateness, runner API, side inputs, state, > > coders) I > > >> haven't had time, but I do intend to both translate from GDoc to some > > other > > >> format and also rewrite versions for users where appropriate. Probably > > this > > >> will mean coming up with that table of contents. > > >> > > >> Kenn > > >> > > >> [1] > > >> > > >> > > > https://lists.apache.org/thread.html/%3c6bc60c88-cf91-4fff-eae6-fea6ee06f...@nanthrax.net%3E > > >> > > >> > > >> On Mon, Apr 24, 2017 at 9:33 AM, Neelesh Salian < > > neeleshssal...@gmail.com> > > >> wrote: > > >> > > >> > Agreed. I have some old JIRAs that I am cleaning up. > > >>
Re: How to control watermark when using BoundedSource
Hi Thomas, Thanks for the explanation. Does it mean I cannot reproduce the real-time behavior of the replayed trace? Say the watermarks are perfect and FixedWindows groups elements into 1-minute windows, will the watermarks trigger the FixedWindows to fire roughly every minute? I am a little confused about the "when available" behavior of the runner. Since the watermarks emitted by the BoundedSource will always be BoundedWindow.TIMESTAMP_MIN_VALUE except for the last watermark, how could the runner know when to trigger the computation on a window? Thanks, Shen On Fri, Apr 28, 2017 at 1:13 PM, Thomas Groh wrote: > You can't directly control the watermark that a BoundedSource emits. > Windowing into FixedWindows will still work as you expect, however: your > elements will be assigned to their windows based on the time the event > occurred. Depending on the runner, triggers may be run either "when > available" or after all the work is completed, but your output data will be > as if you had a perfect watermark. > > On Fri, Apr 28, 2017 at 10:09 AM, Shen Li wrote: > > > Hi, > > > > Say I want to replay a data trace of last week using fixed windows. The > > data trace is read from a file using TextIO. In order to trigger windows > at > > right times, how can I control the watermark emitted by the > BoundedSource? > > > > Thanks, > > > > Shen > > >
Re: How to control watermark when using BoundedSource
You can't directly control the watermark that a BoundedSource emits. Windowing into FixedWindows will still work as you expect, however: your elements will be assigned to their windows based on the time the event occurred. Depending on the runner, triggers may be run either "when available" or after all the work is completed, but your output data will be as if you had a perfect watermark. On Fri, Apr 28, 2017 at 10:09 AM, Shen Li wrote: > Hi, > > Say I want to replay a data trace of last week using fixed windows. The > data trace is read from a file using TextIO. In order to trigger windows at > right times, how can I control the watermark emitted by the BoundedSource? > > Thanks, > > Shen >
How to control watermark when using BoundedSource
Hi, Say I want to replay a data trace of last week using fixed windows. The data trace is read from a file using TextIO. In order to trigger windows at right times, how can I control the watermark emitted by the BoundedSource? Thanks, Shen