Re: Contributing Twister2 runner to Apache Beam

2020-03-08 Thread Kenneth Knowles
I haven't heard anyone suggest that we need a vote. I haven't heard anyone
object to this being merged to master. Some time ago, we mostly decided to
favor master instead of branches, because it is so much smoother for
contributors and users.

So I am poking this thread one last time and otherwise I would consider it
consensus that once code review is done the runner is a part of Beam
(experimental!).

(don't wait for me on code review - if Ismaël said it is good, then it is
good.)

Kenn

On Fri, Mar 6, 2020 at 7:47 AM Pulasthi Supun Wickramasinghe <
pulasthi...@gmail.com> wrote:

> I understand that the discussion is on a more broad level than the
> Twister2 runner. From my experience developing the runner the main
> advantage of being inside the beam project was the easy access to the wide
> range of tests and other core/utility code as Kyle pointed out. Unmerging
> runners that are not properly maintained and updated would be the most
> logical path to follow since the internals of the runners are only well
> understood by developers of that particular project. It would be
> unreasonable to expect the Beam community to maintain them. And since the
> runners do not alter the core API's I assume they would be easy to unmerge
> if the need arises.
>
> Talking specifically about Twister2 runner, we hope to continue developing
> the runner in the future to add both streaming capability and develop a
> portable runner as well. The team behind Twister2 is working towards the
> goal to get the project into Apache Incubator in the near future (Hopefully
> to submit the proposal in the next couple of months).
>
> Best Regards,
> Pulasthi
>
>
>
> On Thu, Mar 5, 2020 at 6:56 PM Robert Bradshaw 
> wrote:
>
>> I think we will get to a point where it makes sense for runners to
>> live in their own repositories, with their own release cadence, but
>> we're not at that point yet. One prerequisite is a stable API--we're
>> closing in on that with the portability protos, but many (java)
>> runners actually share the common runner core libraries and that is
>> even less set in stone.
>>
>> On the other hand, taking responsibility for maintaining all runners
>> is not a tenable or scalable position for the Beam project. If a
>> runner is merged, it should be understood that it can be "un-merged"
>> if it causes a maintenance burden. A completely separate
>> project/repository makes this less messy.
>>
>> On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles  wrote:
>> >
>> > I agree with both of you, mostly :-)
>> >
>> > The monorepo approach doesn't work/scale well for shipped libraries
>> (name a Google library that silently just works and never causes any
>> dependency problems) and the pain we feel has been constant and increasing,
>> but I don't think we are at the breaking point.
>> >
>> > But Google's big monorepo [1] demonstrates similar benefits to what
>> Kyle describes. In the early stages the benefit of not having to think too
>> hard about build/test infra and share it everywhere is a big help, and it
>> scales well. Eventually, shipping test utility libraries and compliance
>> suites can be equivalent. And to your point - it is very helpful for users
>> to know that they can use CassandraIO with the other Beam artifacts. This
>> is why Google requires the whole big repo to depend on a single version of
>> any externally-controlled artifact. But, yes, as a consequence it is
>> preposterously difficult to stay up to date, since literally anything can
>> block progress. You need a unified escalation chain for that policy to make
>> sense. It is the definition of a healthy Apache project to *not* have that
>> (PMC is different).
>> >
>> > Independent dependencies, independent git histories, and independent
>> release cadence/process are all separate discussions.
>> >
>> > It is a broader question than this particular contribution, so let's
>> merge this runner before changing our whole way of doing things :-)
>> >
>> > Kenn
>> >
>> > [1]
>> https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
>> (really quite a balanced analysis)
>> >
>> > On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver 
>> wrote:
>> >>
>> >> > Should runners, current and future, be in the same repository as Beam
>> >> > core?
>> >>
>> >> In the distant past, runners lived in their own repositories, and then
>> were donated to Beam. But Beam's current uber-repo setup allows a lot of
>> convenience. For example, a ton of code (including core functionality and
>> tests) is shared directly between runners, which is useful for keeping
>> runners up to date and ensuring consistent behavior between them (in other
>> words, maintainable and reliable).
>> >>
>> >> Generally, it is up to the authors of a particular Beam related
>> project/subproject to decide whether to host their code in Beam or in a
>> different repo, and up to the community to decide whether to take on the
>> donation, as 

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2020-03-08 Thread Kenneth Knowles
On Sun, Mar 8, 2020 at 1:55 PM Ismaël Mejía  wrote:

> Kenn can you adjust the script to match only source code files
> ... otherwise it produces a lot of extra false positives


I think the sheet only had false matches in build/ directories. Removed.
Can you comment on other cells that look like a new class of false
positives?


> Also can we extract the full annotation as a column so we can filter/group
> for the full kind (type) of the experimental annotation e.g.
> @Experimental(Kind.SCHEMAS),
>

This was already done. It is column D. It maybe is off the side of the
screen for you?


> we agreed with Luke Cwik was to remove the Experimental annotations from
> ‘runners/core*’
>

Make sense; this was never end-user facing.


> It is probably worth to re run the script against the latest master
> because results in the spreadsheet do not correspond with the current
> master.


Hmmm. I just checked and the directory I ran it in is has detached
github/master checked out. So it might be a little stale, but not much.
Since people started to sign up it is a shame to reset the sheet. Probably
the files are still worth looking at, even if the line numbers don't match,
and if it was already processed that is an easy case.


> We also introduced package level Experimental annotations
> (package-info.java) so
> this can easily count for 50 duplicates that should probably be trimmed
> for the
> same person who is covering the corresponding files in the package. With
> all
> these adjustments we will be easily below 250 matches.
>

I agree that it is efficient, but I worry that package level experimental
is basically invisible to users. Since I sorted by filename it should be
easy to write your name once and then drag it to a whole set of files?
Really we mostly only care about "what file, and which KIND annotations are
present". I just made a new tab with that info, but it did not gather all
the different annotations that may be in the file.

Kenn


> Regards,
> Ismaël
>
> [1]
> https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E
>
>
>
> On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles  wrote:
> >
> > OK I tried to make a tiny bit of progress on this, with `grep
> --ignore-case --line-number --recursive '@experimental' .` there are 578
> occurrences (includes website and comments). Via `| cut -d ':' -f 1 | sort
> | uniq | wc -l` there are 377 distinct code files.
> >
> > So that's a big project but easily scales to the contributors. I suggest
> we need to crowdsource a bit.
> >
> > I created
> https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
> where you can suggest/comment adding your name to a file to volunteer to
> own going through the file.
> >
> > I have not checked git history to try to find owners.
> >
> > Kenn
> >
> > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
> >>
> >> Thank you Kenn for starting this discussion.
> >>
> >> As I see, for now, the main goal for “@Experimental" annotation is to
> relive and be useful in the sense as it’s name says (this is obviously not
> a case for the moment). I'd suggest a bit more simplified scenario for this:
> >>
> >> 1. We do a revision of all “@Experimental" annotation uses now. For the
> code (IOs/libs/etc) that we 100% know that has been used in production for
> a long time with current stable API, we just take this annotation away
> since it’s no needed anymore.
> >>
> >> 2. For the code, that is left after p.1, we leave as “@Experimental”,
> wait for N releases (N=3 ?) and then take it away if there are no breaking
> changes happened. We may want to add new argument for “@Experimental” to
> keep track release number when it was added.
> >>
> >> 3. We would need to have a regular “Experimental annotation report”
> (like we have for dependencies) sending to dev@ and it will allow us to
> track new and out-dated annotation.
> >>
> >> 4. And on course we update contributors documentation about that.
> >>
> >> Idea of graduation by voting seems a bit complicated - for me it means
> that all added new user APIs should go through this process and I’m afraid
> that in the end, we potentially can be overwhelmed with number of such
> polls. I think that several releases of maturation and final decision of
> the person(2) responsible for the component should be enough.
> >>
> >> In the same time, I like the Andrew’s idea about checking a breaking
> changes through external tool. So, it could guarantee us to to remove
> experimental state without any fear to break API.
> >>
> >> In case of breaking changes of stable API, that won’t be possible to
> avoid, we still can use @Deprecated and wait for 3 release to remove (as we
> already did before). So, having up-to-date @Experimental and  @Deprecated
> annotations won’t be confusing for users.
> >>
> >>
> >>
> >>
> >>
> >> On 28 Nov 2019, at 04:48, Kenneth Knowles  

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2020-03-08 Thread Ismaël Mejía
Kenn can you adjust the script to match only source code files: `--include
\*.java --include \*.py --include \*.go` otherwise it produces a lot of extra
false positives due to html files and cache files.  Also can we extract the full
annotation as a column so we can filter/group for the full kind (type) of the
experimental annotation e.g. @Experimental(Kind.SCHEMAS),
@Experimental(Kind.SOURCE_SINK), etc.

This way we can group occurrences per kind and quickly triage some of them which
are still clearly still experimental (and with ongoing independent stabilization
efforts [1]) like these:
@Experimental(Kind.SCHEMAS)
@Experimental(Kind.SPLITTABLE_DO_FN)
@Experimental(Kind.PORTABILITY)
(and probably @Experimental(Kind.CONTEXTFUL)

I have been going in the last weeks adjusting the Experimental annotations to
follow the @Experimental(Kind.FOO) pattern thinking about this future triage so
good to see the effort may pay :) As part of this work one idea we agreed with
Luke Cwik was to remove the Experimental annotations from ‘runners/core*’
because historically Beam has not had strong compatibility guarantees for users
of these APIs (runner authors). It is probably worth to re run the script
against the latest master because results in the spreadsheet do not correspond
with the current master. (Note that the remaining External class is still tagged
as Experimental because it is still pending to move it into ‘sdks/java/core’).

Not related to Experimental but worth mentioning is that we also
started tagging:
sdks/java/core/src/main/java/org/apache/beam/sdk/util/*
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/*
as @Internal for the same reasons, classes in both packages are basically for
Internal use on Beam SDK Harness, for runner authors and for tests. And pipeline
authors should not be relying on their stability.

We also introduced package level Experimental annotations (package-info.java) so
this can easily count for 50 duplicates that should probably be trimmed for the
same person who is covering the corresponding files in the package. With all
these adjustments we will be easily below 250 matches.

Regards,
Ismaël

[1] 
https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E



On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles  wrote:
>
> OK I tried to make a tiny bit of progress on this, with `grep --ignore-case 
> --line-number --recursive '@experimental' .` there are 578 occurrences 
> (includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc 
> -l` there are 377 distinct code files.
>
> So that's a big project but easily scales to the contributors. I suggest we 
> need to crowdsource a bit.
>
> I created 
> https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
>  where you can suggest/comment adding your name to a file to volunteer to own 
> going through the file.
>
> I have not checked git history to try to find owners.
>
> Kenn
>
> On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko  
> wrote:
>>
>> Thank you Kenn for starting this discussion.
>>
>> As I see, for now, the main goal for “@Experimental" annotation is to relive 
>> and be useful in the sense as it’s name says (this is obviously not a case 
>> for the moment). I'd suggest a bit more simplified scenario for this:
>>
>> 1. We do a revision of all “@Experimental" annotation uses now. For the code 
>> (IOs/libs/etc) that we 100% know that has been used in production for a long 
>> time with current stable API, we just take this annotation away since it’s 
>> no needed anymore.
>>
>> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait 
>> for N releases (N=3 ?) and then take it away if there are no breaking 
>> changes happened. We may want to add new argument for “@Experimental” to 
>> keep track release number when it was added.
>>
>> 3. We would need to have a regular “Experimental annotation report” (like we 
>> have for dependencies) sending to dev@ and it will allow us to track new and 
>> out-dated annotation.
>>
>> 4. And on course we update contributors documentation about that.
>>
>> Idea of graduation by voting seems a bit complicated - for me it means that 
>> all added new user APIs should go through this process and I’m afraid that 
>> in the end, we potentially can be overwhelmed with number of such polls. I 
>> think that several releases of maturation and final decision of the 
>> person(2) responsible for the component should be enough.
>>
>> In the same time, I like the Andrew’s idea about checking a breaking changes 
>> through external tool. So, it could guarantee us to to remove experimental 
>> state without any fear to break API.
>>
>> In case of breaking changes of stable API, that won’t be possible to avoid, 
>> we still can use @Deprecated and wait for 3 release to remove (as we already 
>> did before). So, having up-to-date @Experimental and  @Deprecated  

Re: Install Jenkins AnsiColor plugin

2020-03-08 Thread Chad Dombrova
I don’t believe that it was ever resolved.  I have a PR with a bunch of
attempts to get it working but I never did figure it out.  IIRC there did
seem to be some ansi plugin already installed but I couldn’t get it to
work.

-chad


On Sun, Mar 8, 2020 at 10:52 AM Ismaël Mejía  wrote:

> Did this ever happen? If not what is blocking it?
>
>
>
> On Tue, Oct 22, 2019 at 10:13 PM Udi Meiri  wrote:
> >
> > Your proposal will only affect the seed job (which doesn't do color
> outputs AFAIK).
> > I think you want to add colorizeOutput() here:
> >
> https://github.com/apache/beam/blob/bfebbd0d16361f61fa40bfdec2f0cb6f943f7c9a/.test-infra/jenkins/CommonJobProperties.groovy#L79-L95
> >
> > Otherwise no concerns from me.
> >
> > On Tue, Oct 22, 2019 at 12:01 PM Chad Dombrova 
> wrote:
> >>
> >> thanks, so IIUC, I’m going to update job_00_seed.groovy like this:
> >>
> >>   wrappers {
> >> colorizeOutput()
> >> timeout {
> >>   absolute(60)
> >>   abortBuild()
> >> }
> >>   }
> >>
> >> Then add the comment run seed job
> >>
> >> Does anyone have any concerns with me trying this out now?
> >>
> >> -chad
> >>
> >>
> >> On Tue, Oct 22, 2019 at 11:42 AM Udi Meiri  wrote:
> >>>
> >>> Also note that changing the job DSL doesn't take effect until the
> "seed" job runs. (use the "run seed job" phrase)
> >>>
> >>> On Tue, Oct 22, 2019 at 11:06 AM Chad Dombrova 
> wrote:
> 
>  Thanks, I'll look into this.  I have a PR I'm building up with a
> handful of minor changes related to this.
> 
> 
> 
>  On Tue, Oct 22, 2019 at 10:45 AM Yifan Zou 
> wrote:
> >
> > Thanks, Udi! The ansicolor plugin was applied to ASF Jenkins
> universally. You might need to explicitly enable the coloroutput in your
> jenkins dsl.
> >
> > On Tue, Oct 22, 2019 at 10:33 AM Udi Meiri  wrote:
> >>
> >> Seems to be already installed:
> https://issues.apache.org/jira/browse/INFRA-16944
> >> Do we just need to enable it somehow?
> >> This might work:
> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.helpers.wrapper.WrapperContext.colorizeOutput
> >>
> >> BTW, our Jenkins is maintained by ASF's Infrastructure team:
> https://cwiki.apache.org/confluence/display/INFRA/Jenkins
> >>
> >> On Tue, Oct 22, 2019 at 10:23 AM Chad Dombrova 
> wrote:
> >>>
> >>> Hi all,
> >>> As a user trying to grok failures in jenkins I think it would be a
> huge help to have color output support.  This is something that works out
> of the box for CI tools like gitlab and travis, and it really helps bring
> that 21st century feel to your logs :)
> >>>
> >>> There's a Jenkins plugin for colorizing ansi escape sequences here:
> >>> https://plugins.jenkins.io/ansicolor
> >>>
> >>> I think this is something that has to be deployed by a Jenkins
> admin.
> >>>
> >>> -chad
> >>>
>


Re: Install Jenkins AnsiColor plugin

2020-03-08 Thread Ismaël Mejía
Did this ever happen? If not what is blocking it?



On Tue, Oct 22, 2019 at 10:13 PM Udi Meiri  wrote:
>
> Your proposal will only affect the seed job (which doesn't do color outputs 
> AFAIK).
> I think you want to add colorizeOutput() here:
> https://github.com/apache/beam/blob/bfebbd0d16361f61fa40bfdec2f0cb6f943f7c9a/.test-infra/jenkins/CommonJobProperties.groovy#L79-L95
>
> Otherwise no concerns from me.
>
> On Tue, Oct 22, 2019 at 12:01 PM Chad Dombrova  wrote:
>>
>> thanks, so IIUC, I’m going to update job_00_seed.groovy like this:
>>
>>   wrappers {
>> colorizeOutput()
>> timeout {
>>   absolute(60)
>>   abortBuild()
>> }
>>   }
>>
>> Then add the comment run seed job
>>
>> Does anyone have any concerns with me trying this out now?
>>
>> -chad
>>
>>
>> On Tue, Oct 22, 2019 at 11:42 AM Udi Meiri  wrote:
>>>
>>> Also note that changing the job DSL doesn't take effect until the "seed" 
>>> job runs. (use the "run seed job" phrase)
>>>
>>> On Tue, Oct 22, 2019 at 11:06 AM Chad Dombrova  wrote:

 Thanks, I'll look into this.  I have a PR I'm building up with a handful 
 of minor changes related to this.



 On Tue, Oct 22, 2019 at 10:45 AM Yifan Zou  wrote:
>
> Thanks, Udi! The ansicolor plugin was applied to ASF Jenkins universally. 
> You might need to explicitly enable the coloroutput in your jenkins dsl.
>
> On Tue, Oct 22, 2019 at 10:33 AM Udi Meiri  wrote:
>>
>> Seems to be already installed: 
>> https://issues.apache.org/jira/browse/INFRA-16944
>> Do we just need to enable it somehow?
>> This might work: 
>> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.helpers.wrapper.WrapperContext.colorizeOutput
>>
>> BTW, our Jenkins is maintained by ASF's Infrastructure team: 
>> https://cwiki.apache.org/confluence/display/INFRA/Jenkins
>>
>> On Tue, Oct 22, 2019 at 10:23 AM Chad Dombrova  wrote:
>>>
>>> Hi all,
>>> As a user trying to grok failures in jenkins I think it would be a huge 
>>> help to have color output support.  This is something that works out of 
>>> the box for CI tools like gitlab and travis, and it really helps bring 
>>> that 21st century feel to your logs :)
>>>
>>> There's a Jenkins plugin for colorizing ansi escape sequences here:
>>> https://plugins.jenkins.io/ansicolor
>>>
>>> I think this is something that has to be deployed by a Jenkins admin.
>>>
>>> -chad
>>>