Re: [DISCUSS] @Experimental annotations - processes and alternatives

2020-03-09 Thread Kenneth Knowles
I have always been unsure if the "kind" is useful. For users, I doubt it is
useful. Now I see that it can be helpful to find all the occurrences by
using an IDE to find references when you want to delete them. Many of the
"kind" enums are not really categories that you can delete all at once.
Like "SOURCE_SINK" is so generic that you cannot treat them as a group. I
would really just group by artifact first, then package/directory, then
file. Some judgment can be applied and I think it will not be very
difficult.

Kenn

On Mon, Mar 9, 2020 at 12:01 PM Alexey Romanenko 
wrote:

> Thanks Kenn for moving this forward.
>
> Though, what still buzzes me is - do we have a consensus about what we
> actually do with different type of annotations?
> Can we say, for example, that “
> @Experimental(Experimental.Kind.SOURCE_SINK)” is useless and we can get
> rid of it easily? Either, since Schema API is still under development than “
> @Experimental(Kind.SCHEMAS)” is required everywhere where Schema is used
> in public API? And so on…
> Does it make sense to split down this list by types of experimental annotation
> and the final decision for every type will be dependent on this?
>
> On 9 Mar 2020, at 04:39, Kenneth Knowles  wrote:
>
> On Sun, Mar 8, 2020 at 1:55 PM Ismaël Mejía  wrote:
>
>> Kenn can you adjust the script to match only source code files
>> ... otherwise it produces a lot of extra false positives
>
>
> I think the sheet only had false matches in build/ directories. Removed.
> Can you comment on other cells that look like a new class of false
> positives?
>
>
>> Also can we extract the full annotation as a column so we can
>> filter/group for the full kind (type) of the experimental annotation e.g.
>> @Experimental(Kind.SCHEMAS),
>>
>
> This was already done. It is column D. It maybe is off the side of the
> screen for you?
>
>
>> we agreed with Luke Cwik was to remove the Experimental annotations from
>> ‘runners/core*’
>>
>
> Make sense; this was never end-user facing.
>
>
>> It is probably worth to re run the script against the latest master
>> because results in the spreadsheet do not correspond with the current
>> master.
>
>
> Hmmm. I just checked and the directory I ran it in is has detached
> github/master checked out. So it might be a little stale, but not much.
> Since people started to sign up it is a shame to reset the sheet. Probably
> the files are still worth looking at, even if the line numbers don't match,
> and if it was already processed that is an easy case.
>
>
>> We also introduced package level Experimental annotations
>> (package-info.java) so
>> this can easily count for 50 duplicates that should probably be trimmed
>> for the
>> same person who is covering the corresponding files in the package. With
>> all
>> these adjustments we will be easily below 250 matches.
>>
>
> I agree that it is efficient, but I worry that package level experimental
> is basically invisible to users. Since I sorted by filename it should be
> easy to write your name once and then drag it to a whole set of files?
> Really we mostly only care about "what file, and which KIND annotations are
> present". I just made a new tab with that info, but it did not gather all
> the different annotations that may be in the file.
>
> Kenn
>
>
>> Regards,
>> Ismaël
>>
>> [1]
>> https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E
>>
>>
>>
>> On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles  wrote:
>> >
>> > OK I tried to make a tiny bit of progress on this, with `grep
>> --ignore-case --line-number --recursive '@experimental' .` there are 578
>> occurrences (includes website and comments). Via `| cut -d ':' -f 1 | sort
>> | uniq | wc -l` there are 377 distinct code files.
>> >
>> > So that's a big project but easily scales to the contributors. I
>> suggest we need to crowdsource a bit.
>> >
>> > I created
>> https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
>> where you can suggest/comment adding your name to a file to volunteer to
>> own going through the file.
>> >
>> > I have not checked git history to try to find owners.
>> >
>> > Kenn
>> >
>> > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>> >>
>> >> Thank you Kenn for starting this discussion.
>> >>
>> >> As I see, for now, the main goal for “@Experimental" annotation is to
>> relive and be useful in the sense as it’s name says (this is obviously not
>> a case for the moment). I'd suggest a bit more simplified scenario for this:
>> >>
>> >> 1. We do a revision of all “@Experimental" annotation uses now. For
>> the code (IOs/libs/etc) that we 100% know that has been used in production
>> for a long time with current stable API, we just take this annotation away
>> since it’s no needed anymore.
>> >>
>> >> 2. For the code, that is left after p.1, we leave as “@Experimental”,
>> wait for N releases (N=3 ?) 

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2020-03-09 Thread Alexey Romanenko
Thanks Kenn for moving this forward. 

Though, what still buzzes me is - do we have a consensus about what we actually 
do with different type of annotations? 
Can we say, for example, that “@Experimental(Experimental.Kind.SOURCE_SINK)” is 
useless and we can get rid of it easily? Either, since Schema API is still 
under development than “@Experimental(Kind.SCHEMAS)” is required everywhere 
where Schema is used in public API? And so on…
Does it make sense to split down this list by types of experimental annotation 
and the final decision for every type will be dependent on this?

> On 9 Mar 2020, at 04:39, Kenneth Knowles  wrote:
> 
> On Sun, Mar 8, 2020 at 1:55 PM Ismaël Mejía  > wrote:
> Kenn can you adjust the script to match only source code files ... otherwise 
> it produces a lot of extra false positives
> 
> I think the sheet only had false matches in build/ directories. Removed. Can 
> you comment on other cells that look like a new class of false positives?
>  
> Also can we extract the full annotation as a column so we can filter/group 
> for the full kind (type) of the experimental annotation e.g. 
> @Experimental(Kind.SCHEMAS),
> 
> This was already done. It is column D. It maybe is off the side of the screen 
> for you?
>   
> we agreed with Luke Cwik was to remove the Experimental annotations from 
> ‘runners/core*’
> 
> Make sense; this was never end-user facing.
>  
> It is probably worth to re run the script against the latest master because 
> results in the spreadsheet do not correspond with the current master.
> 
> Hmmm. I just checked and the directory I ran it in is has detached 
> github/master checked out. So it might be a little stale, but not much. Since 
> people started to sign up it is a shame to reset the sheet. Probably the 
> files are still worth looking at, even if the line numbers don't match, and 
> if it was already processed that is an easy case.
>  
> We also introduced package level Experimental annotations (package-info.java) 
> so
> this can easily count for 50 duplicates that should probably be trimmed for 
> the
> same person who is covering the corresponding files in the package. With all
> these adjustments we will be easily below 250 matches.
> 
> I agree that it is efficient, but I worry that package level experimental is 
> basically invisible to users. Since I sorted by filename it should be easy to 
> write your name once and then drag it to a whole set of files? Really we 
> mostly only care about "what file, and which KIND annotations are present". I 
> just made a new tab with that info, but it did not gather all the different 
> annotations that may be in the file.
> 
> Kenn
> 
> 
> Regards,
> Ismaël
> 
> [1] 
> https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E
>  
> 
> 
> 
> 
> On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles  > wrote:
> >
> > OK I tried to make a tiny bit of progress on this, with `grep --ignore-case 
> > --line-number --recursive '@experimental' .` there are 578 occurrences 
> > (includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc 
> > -l` there are 377 distinct code files.
> >
> > So that's a big project but easily scales to the contributors. I suggest we 
> > need to crowdsource a bit.
> >
> > I created 
> > https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
> >  
> > 
> >  where you can suggest/comment adding your name to a file to volunteer to 
> > own going through the file.
> >
> > I have not checked git history to try to find owners.
> >
> > Kenn
> >
> > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko  > > wrote:
> >>
> >> Thank you Kenn for starting this discussion.
> >>
> >> As I see, for now, the main goal for “@Experimental" annotation is to 
> >> relive and be useful in the sense as it’s name says (this is obviously not 
> >> a case for the moment). I'd suggest a bit more simplified scenario for 
> >> this:
> >>
> >> 1. We do a revision of all “@Experimental" annotation uses now. For the 
> >> code (IOs/libs/etc) that we 100% know that has been used in production for 
> >> a long time with current stable API, we just take this annotation away 
> >> since it’s no needed anymore.
> >>
> >> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait 
> >> for N releases (N=3 ?) and then take it away if there are no breaking 
> >> changes happened. We may want to add new argument for “@Experimental” to 
> >> keep track release number when it was added.
> >>
> >> 3. We would need to have a regular “Experimental annotation report” (like 
> >> we have for 

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2020-03-08 Thread Kenneth Knowles
On Sun, Mar 8, 2020 at 1:55 PM Ismaël Mejía  wrote:

> Kenn can you adjust the script to match only source code files
> ... otherwise it produces a lot of extra false positives


I think the sheet only had false matches in build/ directories. Removed.
Can you comment on other cells that look like a new class of false
positives?


> Also can we extract the full annotation as a column so we can filter/group
> for the full kind (type) of the experimental annotation e.g.
> @Experimental(Kind.SCHEMAS),
>

This was already done. It is column D. It maybe is off the side of the
screen for you?


> we agreed with Luke Cwik was to remove the Experimental annotations from
> ‘runners/core*’
>

Make sense; this was never end-user facing.


> It is probably worth to re run the script against the latest master
> because results in the spreadsheet do not correspond with the current
> master.


Hmmm. I just checked and the directory I ran it in is has detached
github/master checked out. So it might be a little stale, but not much.
Since people started to sign up it is a shame to reset the sheet. Probably
the files are still worth looking at, even if the line numbers don't match,
and if it was already processed that is an easy case.


> We also introduced package level Experimental annotations
> (package-info.java) so
> this can easily count for 50 duplicates that should probably be trimmed
> for the
> same person who is covering the corresponding files in the package. With
> all
> these adjustments we will be easily below 250 matches.
>

I agree that it is efficient, but I worry that package level experimental
is basically invisible to users. Since I sorted by filename it should be
easy to write your name once and then drag it to a whole set of files?
Really we mostly only care about "what file, and which KIND annotations are
present". I just made a new tab with that info, but it did not gather all
the different annotations that may be in the file.

Kenn


> Regards,
> Ismaël
>
> [1]
> https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E
>
>
>
> On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles  wrote:
> >
> > OK I tried to make a tiny bit of progress on this, with `grep
> --ignore-case --line-number --recursive '@experimental' .` there are 578
> occurrences (includes website and comments). Via `| cut -d ':' -f 1 | sort
> | uniq | wc -l` there are 377 distinct code files.
> >
> > So that's a big project but easily scales to the contributors. I suggest
> we need to crowdsource a bit.
> >
> > I created
> https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
> where you can suggest/comment adding your name to a file to volunteer to
> own going through the file.
> >
> > I have not checked git history to try to find owners.
> >
> > Kenn
> >
> > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
> >>
> >> Thank you Kenn for starting this discussion.
> >>
> >> As I see, for now, the main goal for “@Experimental" annotation is to
> relive and be useful in the sense as it’s name says (this is obviously not
> a case for the moment). I'd suggest a bit more simplified scenario for this:
> >>
> >> 1. We do a revision of all “@Experimental" annotation uses now. For the
> code (IOs/libs/etc) that we 100% know that has been used in production for
> a long time with current stable API, we just take this annotation away
> since it’s no needed anymore.
> >>
> >> 2. For the code, that is left after p.1, we leave as “@Experimental”,
> wait for N releases (N=3 ?) and then take it away if there are no breaking
> changes happened. We may want to add new argument for “@Experimental” to
> keep track release number when it was added.
> >>
> >> 3. We would need to have a regular “Experimental annotation report”
> (like we have for dependencies) sending to dev@ and it will allow us to
> track new and out-dated annotation.
> >>
> >> 4. And on course we update contributors documentation about that.
> >>
> >> Idea of graduation by voting seems a bit complicated - for me it means
> that all added new user APIs should go through this process and I’m afraid
> that in the end, we potentially can be overwhelmed with number of such
> polls. I think that several releases of maturation and final decision of
> the person(2) responsible for the component should be enough.
> >>
> >> In the same time, I like the Andrew’s idea about checking a breaking
> changes through external tool. So, it could guarantee us to to remove
> experimental state without any fear to break API.
> >>
> >> In case of breaking changes of stable API, that won’t be possible to
> avoid, we still can use @Deprecated and wait for 3 release to remove (as we
> already did before). So, having up-to-date @Experimental and  @Deprecated
> annotations won’t be confusing for users.
> >>
> >>
> >>
> >>
> >>
> >> On 28 Nov 2019, at 04:48, Kenneth Knowles  

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2020-03-08 Thread Ismaël Mejía
Kenn can you adjust the script to match only source code files: `--include
\*.java --include \*.py --include \*.go` otherwise it produces a lot of extra
false positives due to html files and cache files.  Also can we extract the full
annotation as a column so we can filter/group for the full kind (type) of the
experimental annotation e.g. @Experimental(Kind.SCHEMAS),
@Experimental(Kind.SOURCE_SINK), etc.

This way we can group occurrences per kind and quickly triage some of them which
are still clearly still experimental (and with ongoing independent stabilization
efforts [1]) like these:
@Experimental(Kind.SCHEMAS)
@Experimental(Kind.SPLITTABLE_DO_FN)
@Experimental(Kind.PORTABILITY)
(and probably @Experimental(Kind.CONTEXTFUL)

I have been going in the last weeks adjusting the Experimental annotations to
follow the @Experimental(Kind.FOO) pattern thinking about this future triage so
good to see the effort may pay :) As part of this work one idea we agreed with
Luke Cwik was to remove the Experimental annotations from ‘runners/core*’
because historically Beam has not had strong compatibility guarantees for users
of these APIs (runner authors). It is probably worth to re run the script
against the latest master because results in the spreadsheet do not correspond
with the current master. (Note that the remaining External class is still tagged
as Experimental because it is still pending to move it into ‘sdks/java/core’).

Not related to Experimental but worth mentioning is that we also
started tagging:
sdks/java/core/src/main/java/org/apache/beam/sdk/util/*
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/*
as @Internal for the same reasons, classes in both packages are basically for
Internal use on Beam SDK Harness, for runner authors and for tests. And pipeline
authors should not be relying on their stability.

We also introduced package level Experimental annotations (package-info.java) so
this can easily count for 50 duplicates that should probably be trimmed for the
same person who is covering the corresponding files in the package. With all
these adjustments we will be easily below 250 matches.

Regards,
Ismaël

[1] 
https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E



On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles  wrote:
>
> OK I tried to make a tiny bit of progress on this, with `grep --ignore-case 
> --line-number --recursive '@experimental' .` there are 578 occurrences 
> (includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc 
> -l` there are 377 distinct code files.
>
> So that's a big project but easily scales to the contributors. I suggest we 
> need to crowdsource a bit.
>
> I created 
> https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
>  where you can suggest/comment adding your name to a file to volunteer to own 
> going through the file.
>
> I have not checked git history to try to find owners.
>
> Kenn
>
> On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko  
> wrote:
>>
>> Thank you Kenn for starting this discussion.
>>
>> As I see, for now, the main goal for “@Experimental" annotation is to relive 
>> and be useful in the sense as it’s name says (this is obviously not a case 
>> for the moment). I'd suggest a bit more simplified scenario for this:
>>
>> 1. We do a revision of all “@Experimental" annotation uses now. For the code 
>> (IOs/libs/etc) that we 100% know that has been used in production for a long 
>> time with current stable API, we just take this annotation away since it’s 
>> no needed anymore.
>>
>> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait 
>> for N releases (N=3 ?) and then take it away if there are no breaking 
>> changes happened. We may want to add new argument for “@Experimental” to 
>> keep track release number when it was added.
>>
>> 3. We would need to have a regular “Experimental annotation report” (like we 
>> have for dependencies) sending to dev@ and it will allow us to track new and 
>> out-dated annotation.
>>
>> 4. And on course we update contributors documentation about that.
>>
>> Idea of graduation by voting seems a bit complicated - for me it means that 
>> all added new user APIs should go through this process and I’m afraid that 
>> in the end, we potentially can be overwhelmed with number of such polls. I 
>> think that several releases of maturation and final decision of the 
>> person(2) responsible for the component should be enough.
>>
>> In the same time, I like the Andrew’s idea about checking a breaking changes 
>> through external tool. So, it could guarantee us to to remove experimental 
>> state without any fear to break API.
>>
>> In case of breaking changes of stable API, that won’t be possible to avoid, 
>> we still can use @Deprecated and wait for 3 release to remove (as we already 
>> did before). So, having up-to-date @Experimental and  @Deprecated  

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2020-03-06 Thread Kenneth Knowles
OK I tried to make a tiny bit of progress on this, with `grep --ignore-case
--line-number --recursive '@experimental' .` there are 578 occurrences
(includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc
-l` there are 377 distinct code files.

So that's a big project but easily scales to the contributors. I suggest we
need to crowdsource a bit.

I created
https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
where you can suggest/comment adding your name to a file to volunteer to
own going through the file.

I have not checked git history to try to find owners.

Kenn

On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko 
wrote:

> Thank you Kenn for starting this discussion.
>
> As I see, for now, the main goal for “@Experimental" annotation is to
> relive and be useful in the sense as it’s name says (this is obviously not
> a case for the moment). I'd suggest a bit more simplified scenario for this:
>
> 1. We do a revision of all “@Experimental" annotation uses now. For the
> code (IOs/libs/etc) that we 100% know that has been used in production for
> a long time with current stable API, we just take this annotation away
> since it’s no needed anymore.
>
> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait
> for N releases (N=3 ?) and then take it away if there are no breaking
> changes happened. We may want to add new argument for “@Experimental” to
> keep track release number when it was added.
>
> 3. We would need to have a regular “Experimental annotation report” (like
> we have for dependencies) sending to dev@ and it will allow us to track
> new and out-dated annotation.
>
> 4. And on course we update contributors documentation about that.
>
> Idea of graduation by voting seems a bit complicated - for me it means
> that all added new user APIs should go through this process and I’m afraid
> that in the end, we potentially can be overwhelmed with number of such
> polls. I think that several releases of maturation and final decision of
> the person(2) responsible for the component should be enough.
>
> In the same time, I like the Andrew’s idea about checking a breaking
> changes through external tool. So, it could guarantee us to to remove
> experimental state without any fear to break API.
>
> In case of breaking changes of stable API, that won’t be possible to
> avoid, we still can use @Deprecated and wait for 3 release to remove (as we
> already did before). So, having up-to-date @Experimental and  @Deprecated
>  annotations won’t be confusing for users.
>
>
>
>
>
> On 28 Nov 2019, at 04:48, Kenneth Knowles  wrote:
>
>
>
> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold 
> wrote:
>
>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles  wrote:
>> >
>>
>> > *Opt-in*: This is a powerful idea that I think changes everything.
>> >- for an experimental new IO, a separate artifact; this way we can
>> also see downloads
>> >- for experimental code fragments, add checkState that the relevant
>> experiment is turned on via flags
>>
>> To be clear the experimental artifact would have the same group ID and
>> artifact ID but a different version than the non-experimental
>> artifacts?  E.g.
>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
>>
>> That could work. Changing the artifact ID or the package name would
>> risk split package issues and diamond dependency problems. We'd still
>> need to be careful about mixing experimental and non-experimental
>> artifacts.
>>
>
> That's clever! I think using the classifier might be better than a
> modified version number, e.g.
> org.apache.beam:beam-io-mydb:2.4.0:experimental
>
> My prior idea was much less clever: for any version 2.X there would either
> be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no
> problem with a split package. There would be no "same artifact id" concern.
>
> Your idea would allow us to ship two variants of the library, if we
> developed the tooling for it. I think doing the stripping of experimental
> bits and ensuring they both compile might be tricky unless we are stripping
> rather disjoint piece of the library.
>
> Kenn
>
>
>


Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-12-02 Thread Alexey Romanenko
Thank you Kenn for starting this discussion.

As I see, for now, the main goal for “@Experimental" annotation is to relive 
and be useful in the sense as it’s name says (this is obviously not a case for 
the moment). I'd suggest a bit more simplified scenario for this:

1. We do a revision of all “@Experimental" annotation uses now. For the code 
(IOs/libs/etc) that we 100% know that has been used in production for a long 
time with current stable API, we just take this annotation away since it’s no 
needed anymore.

2. For the code, that is left after p.1, we leave as “@Experimental”, wait for 
N releases (N=3 ?) and then take it away if there are no breaking changes 
happened. We may want to add new argument for “@Experimental” to keep track 
release number when it was added.

3. We would need to have a regular “Experimental annotation report” (like we 
have for dependencies) sending to dev@ and it will allow us to track new and 
out-dated annotation.

4. And on course we update contributors documentation about that.

Idea of graduation by voting seems a bit complicated - for me it means that all 
added new user APIs should go through this process and I’m afraid that in the 
end, we potentially can be overwhelmed with number of such polls. I think that 
several releases of maturation and final decision of the person(2) responsible 
for the component should be enough.

In the same time, I like the Andrew’s idea about checking a breaking changes 
through external tool. So, it could guarantee us to to remove experimental 
state without any fear to break API.  

In case of breaking changes of stable API, that won’t be possible to avoid, we 
still can use @Deprecated and wait for 3 release to remove (as we already did 
before). So, having up-to-date @Experimental and  @Deprecated  annotations 
won’t be confusing for users.





> On 28 Nov 2019, at 04:48, Kenneth Knowles  wrote:
> 
> 
> 
> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold  > wrote:
> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles  > wrote:
> >
> 
> > *Opt-in*: This is a powerful idea that I think changes everything.
> >- for an experimental new IO, a separate artifact; this way we can also 
> > see downloads
> >- for experimental code fragments, add checkState that the relevant 
> > experiment is turned on via flags
> 
> To be clear the experimental artifact would have the same group ID and
> artifact ID but a different version than the non-experimental
> artifacts?  E.g.
> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
> 
> That could work. Changing the artifact ID or the package name would
> risk split package issues and diamond dependency problems. We'd still
> need to be careful about mixing experimental and non-experimental
> artifacts.
> 
> That's clever! I think using the classifier might be better than a modified 
> version number, e.g. org.apache.beam:beam-io-mydb:2.4.0:experimental
> 
> My prior idea was much less clever: for any version 2.X there would either be 
> beam-io-mydb-experimental or beam-io-mydb (after graduation) so no problem 
> with a split package. There would be no "same artifact id" concern.
> 
> Your idea would allow us to ship two variants of the library, if we developed 
> the tooling for it. I think doing the stripping of experimental bits and 
> ensuring they both compile might be tricky unless we are stripping rather 
> disjoint piece of the library.
> 
> Kenn 



Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Kenneth Knowles
On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold 
wrote:

> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles  wrote:
> >
>
> > *Opt-in*: This is a powerful idea that I think changes everything.
> >- for an experimental new IO, a separate artifact; this way we can
> also see downloads
> >- for experimental code fragments, add checkState that the relevant
> experiment is turned on via flags
>
> To be clear the experimental artifact would have the same group ID and
> artifact ID but a different version than the non-experimental
> artifacts?  E.g.
> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
>
> That could work. Changing the artifact ID or the package name would
> risk split package issues and diamond dependency problems. We'd still
> need to be careful about mixing experimental and non-experimental
> artifacts.
>

That's clever! I think using the classifier might be better than a modified
version number, e.g. org.apache.beam:beam-io-mydb:2.4.0:experimental

My prior idea was much less clever: for any version 2.X there would either
be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no
problem with a split package. There would be no "same artifact id" concern.

Your idea would allow us to ship two variants of the library, if we
developed the tooling for it. I think doing the stripping of experimental
bits and ensuring they both compile might be tricky unless we are stripping
rather disjoint piece of the library.

Kenn


Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Elliotte Rusty Harold
On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles  wrote:
>

> *Opt-in*: This is a powerful idea that I think changes everything.
>- for an experimental new IO, a separate artifact; this way we can also 
> see downloads
>- for experimental code fragments, add checkState that the relevant 
> experiment is turned on via flags

To be clear the experimental artifact would have the same group ID and
artifact ID but a different version than the non-experimental
artifacts?  E.g.
org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental

That could work. Changing the artifact ID or the package name would
risk split package issues and diamond dependency problems. We'd still
need to be careful about mixing experimental and non-experimental
artifacts.



-- 
Elliotte Rusty Harold
elh...@ibiblio.org


Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Andrew Pilloud
We need is an annotation checker to ensure every public method is
tagged either @Experimental or @Deprecated. That way there will be no
confusion about what we expect to be stable. If we really want to
offer stable APIs there exist many tools (such as JAPICC[1]) to ensure
we don't make breaking changes. Without some actual tests we are just
hoping we don't break anything, so everything is actually
@Experimental.

Andrew

[1] https://lvc.github.io/japi-compliance-checker/

On Wed, Nov 27, 2019 at 10:12 AM Kenneth Knowles  wrote:
>
> Hi all
>
> I wanted to start a dedicated thread to the discussion of how to manage our 
> @Experimental annotations, API evolution in general, etc.
>
> After some email back-and-forth this will get too big so then I will try to 
> summarize into a document. But I think a thread to start with makes sense.
>
> Problem statement:
>
> 1. Users need stable APIs so their software can just keep working
> 2. Breaking changes are necessary to achieve correctness / high quality
>
> Neither of these is actually universally true. Many changes don't really hurt 
> users, and some APIs are so obvious they don't require adjustment, or 
> continuing to use an inferior API is OK since at least correctness is 
> possible.
>
> But we have had to many breaking changes in Beam, some quite late, for the 
> purposes of fixing major data loss bugs, design errors, changes in underlying 
> services, and usability. [1] So I take for granted that we do need to make 
> these changes.
>
> So the problem becomes:
>
> 1. Users need to know *which* APIs are frozen, clearly and with enough buy-in 
> that changes don't surprise them
> 2. Useful APIs that are not technically frozen but never change will still 
> get usage and should "graduate"
>
> Current status:
>
>  - APIs (classes, methods, etc) can be marked "experimental" with annotations 
> in languages
>  - "experimental" features are shipped in the same jar with non-experimental 
> bits; sometimes it is just a couple methods or classes
>  - "experimental" APIs are supposed to allow breaking changes
>  - there is no particular process for removing "experimental" status
>  - we do go through "deprecation" process even for experimental things
>
> Downsides to this:
>
>  - tons of Beam has become very mature but still "experimental" so it isn't 
> really safe to make breaking changes
>  - users are not really alerted that well to when they are using unstable 
> pieces
>  - we don't have an easy way to determine the impact of any breaking changes
>  - we also don't have a clear policy or guidance around underlying 
> services/client libs making breaking changes (such as services rejecting 
> older clients)
>  - having something both "experimental" and "deprecated" is maybe confusing, 
> but also just deleting experimental stuff is not safe in the current state of 
> things
>
> Some proposals that I can think of people made:
>
>  - making experimental features opt-in only (for example by a separate dep or 
> a flag)
>  - putting a version next to any experimental annotation and force review at 
> that time (lots of objections to this, but noting it for completeness)
>  - reviews for graduating on a case-by-case basis, with dev@ thread and maybe 
> vote
>  - try to improve our ability to know usage of experimental features (really, 
> all features!)
>
> I will start with my own thoughts from here:
>
> *Opt-in*: This is a powerful idea that I think changes everything.
>- for an experimental new IO, a separate artifact; this way we can also 
> see downloads
>- for experimental code fragments, add checkState that the relevant 
> experiment is turned on via flags
>
> *Graduation*: Once things are opt-in, the drive to graduate them will be 
> stronger than it is today. I think vote is appropriate, with rationale 
> including usage and test coverage and stability, since it is a commitment by 
> the community to maintain the code, which constitutes most of the TCO of code.
>
> *Tracking*:
>  - We should know what experiments we have and how old they are.
>  - It means that just tagging methods and classes "@Experimental" doesn't 
> really work. I think that is probably a good thing. It is confusing to have 
> hundreds of tiny experiments. We can target larger-scale experiments.
>  - If we regularly poll on twitter or user@ about features then it might 
> become a source of OK signal, for things where we cannot look at download 
> stats.
>
> I think with these three approaches, the @Experimental annotation is actually 
> obsolete. We could still use it to drive some kind of annotation processor to 
> ensure "if there is @Experimental then there is a checkState" but I don't 
> have experience doing such things.
>
> Kenn
>
> [1] 
> https://lists.apache.org/thread.html/1bfe7aa55f8d77c4ddfde39595c9473b233edfcc3255ed38b3f85612@%3Cdev.beam.apache.org%3E