Re: Missing copyright notices due to LICENSE change

2021-03-17 Thread Ahmet Altay
Thank you Rebo. I agree with reverting first and then figure out the next
steps.

Here is a PR to revert your change:
https://github.com/apache/beam/pull/14267

On Wed, Mar 17, 2021 at 4:02 PM Robert Burke  wrote:

> Looking at the history it seems that before the python text was added,
> pkg.go.dev can parse the license stack just fine. It doesn't recognize
> the PSF license, and fails closed entirely as a result.
>
> I've filed an issue with pkg.go.dev (
> https://github.com/golang/go/issues/45095). If the bug is fixed, the
> affected versions will become visible as well.
>
> In the meantime, we should revert my change which clobbered the other
> licenses and probably cherry pick it into the affected release branches.
>
> The PSF license is annoying as it's explicitly unique. Nothing but python
> can use it and call it the PSF license. However it is a redistribution
> friendly license, which is what matters.
>
> On Wed, Mar 17, 2021, 3:00 PM Ahmet Altay  wrote:
>
>> Thank you for this email.
>>
>>
>> On Wed, Mar 17, 2021 at 2:32 PM Brian Hulette 
>> wrote:
>>
>>> I just noticed that there was a recent change to our LICENSE file to
>>> make it exactly match the Apache 2.0 License [1]. This seems to be the
>>> result of two conflicting LICENSE issues.
>>>
>>> Go LICENSE issue: The motivation for [1] was to satisfy pkg.go.dev's
>>> license policies [2]. Prior to the change our documentation didn't show up
>>> there [3].
>>>
>>> Java artifact LICENSE issue: The removed text contained information
>>> relevant to "convenience binary distributions". This text was added in [4]
>>> as a result of this dev@ thread [5], where we noticed that copyright
>>> notices were missing in binary artifacts. The suggested solution (that we
>>> went with) was to just add the information to the root (source) LICENSE.
>>>
>>
>> Python distribution is missing both files as well. (
>> https://issues.apache.org/jira/browse/BEAM-1746)
>>
>>
>>>
>>> I'm not sure that that solution is consistent with this ASF guide [6]
>>> which states:
>>>
>>> > The LICENSE and NOTICE files must *exactly* represent the contents of
>>> the distribution they reside in. Only components and resources that are
>>> actually included in a distribution have any bearing on the content of that
>>> distribution's NOTICE and LICENSE.
>>>
>>> I would argue that *just* Apache 2.0 is the correct text for our root
>>> (source) LICENSE, and the correct way to deal with binary artifacts is to
>>> generate per-artifact LICENSE/NOTICE files.
>>>
>>
>> I do not know how to interpret this ASF guide. As an example from another
>> project: airflow also has a LICENSE file, NOTICE file, and a licenses
>> directory. There are even overlapping mentions.
>>
>>
>>>
>>>
>>> So right now the Go issue is fixed, but the Java artifact issue has
>>> regressed. I can think of two potential solutions to resolve both:
>>> 1) Restore the "convenience binary distributions" information, and see
>>> if we can get pkg.go.dev to allow it.
>>> 2) Add infrastructure to generate LICENSE and NOTICE files for Java
>>> binary artifacts.
>>>
>>> I have no idea how we might implement (2) so (1) seems more tenable, but
>>> less correct since it's adding information not relevant to the source
>>> release.
>>>
>>> Brian
>>>
>>>
>>> [1] https://github.com/apache/beam/pull/11657
>>> [2] https://pkg.go.dev/license-policy
>>> [3]
>>> https://pkg.go.dev/github.com/apache/beam@v2.21.0+incompatible/sdks/go/pkg/beam
>>> [4] https://github.com/apache/beam/pull/5461
>>> [5]
>>> https://lists.apache.org/thread.html/6ef6630e908147ee83e1f1efd4befbda43efb2a59271c5cb49473103@%3Cdev.beam.apache.org%3E
>>> [6] https://infra.apache.org/licensing-howto.html
>>>
>>


Re: [VOTE] New Committer: Tomo Suzuki

2021-03-17 Thread Kenneth Knowles
Oh, sheesh. Very sorry. This is very inappropriate. I apologize.

Kenn

On Wed, Mar 17, 2021 at 6:00 PM Andrew Pilloud  wrote:

> Wrong list...
>
> On Wed, Mar 17, 2021 at 5:49 PM Robert Bradshaw 
> wrote:
>
>> +1
>>
>> On Wed, Mar 17, 2021 at 5:48 PM Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> Please vote on the proposal for Tomo Suzuki to become a committer in the
>>> Apache Beam project, as follows:
>>>
>>> [ ] +1, Approve the proposal for the candidate to become a committer
>>> [ ] -1, Disapprove the proposal for the candidate to become a committer
>>>
>>> The vote will be open for at least 72 hours excluding weekends. It is
>>> adopted by majority approval, with at least 3 PMC affirmative votes.
>>>
>>> See discussion thread for more details [1].
>>>
>>> Kenn
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/r7e1c5ba7209b598bcd1bfce515744276be8d0eddacd7a0aa688f8595%40%3Cprivate.beam.apache.org%3E
>>>
>>


Re: [VOTE] New Committer: Tomo Suzuki

2021-03-17 Thread Andrew Pilloud
Wrong list...

On Wed, Mar 17, 2021 at 5:49 PM Robert Bradshaw  wrote:

> +1
>
> On Wed, Mar 17, 2021 at 5:48 PM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> Please vote on the proposal for Tomo Suzuki to become a committer in the
>> Apache Beam project, as follows:
>>
>> [ ] +1, Approve the proposal for the candidate to become a committer
>> [ ] -1, Disapprove the proposal for the candidate to become a committer
>>
>> The vote will be open for at least 72 hours excluding weekends. It is
>> adopted by majority approval, with at least 3 PMC affirmative votes.
>>
>> See discussion thread for more details [1].
>>
>> Kenn
>>
>> [1]
>> https://lists.apache.org/thread.html/r7e1c5ba7209b598bcd1bfce515744276be8d0eddacd7a0aa688f8595%40%3Cprivate.beam.apache.org%3E
>>
>


Re: [VOTE] New Committer: Tomo Suzuki

2021-03-17 Thread Robert Bradshaw
+1

On Wed, Mar 17, 2021 at 5:48 PM Kenneth Knowles  wrote:

> Hi all,
>
> Please vote on the proposal for Tomo Suzuki to become a committer in the
> Apache Beam project, as follows:
>
> [ ] +1, Approve the proposal for the candidate to become a committer
> [ ] -1, Disapprove the proposal for the candidate to become a committer
>
> The vote will be open for at least 72 hours excluding weekends. It is
> adopted by majority approval, with at least 3 PMC affirmative votes.
>
> See discussion thread for more details [1].
>
> Kenn
>
> [1]
> https://lists.apache.org/thread.html/r7e1c5ba7209b598bcd1bfce515744276be8d0eddacd7a0aa688f8595%40%3Cprivate.beam.apache.org%3E
>


[VOTE] New Committer: Tomo Suzuki

2021-03-17 Thread Kenneth Knowles
Hi all,

Please vote on the proposal for Tomo Suzuki to become a committer in the
Apache Beam project, as follows:

[ ] +1, Approve the proposal for the candidate to become a committer
[ ] -1, Disapprove the proposal for the candidate to become a committer

The vote will be open for at least 72 hours excluding weekends. It is
adopted by majority approval, with at least 3 PMC affirmative votes.

See discussion thread for more details [1].

Kenn

[1]
https://lists.apache.org/thread.html/r7e1c5ba7209b598bcd1bfce515744276be8d0eddacd7a0aa688f8595%40%3Cprivate.beam.apache.org%3E


Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-03-17 Thread Robert Bradshaw
On Thu, Feb 25, 2021 at 12:53 AM Jan Lukavský  wrote:

> I get this, this makes totally sense. But - what else could the
> propagation meaningfully do, then to propagate the 10 seconds triggering to
> the very first GBK(s) and then try to push the outcome of these PTransforms
> as fast as possible through the pipeline? Yes, seems it would require
> retractions, at least in cases when the DAG contains multiple paths from
> root(s) to leaf. It seems to me, that the intermediate GBK(s) play no role,
> because if they do not trigger as fast as possible (and retract wrongly
> triggered outputs due to out-of-orderness), what they do, is they add
> latency and actually make the "sink triggering" not trigger at the
> configured frequency. Everything else seems clear to me, I just don't get
> this part. Is is possible to describe a specific a example where an inner
> GBK would trigger with some different trigger than with each pane?
>
I just realized this was never answered. Yes, I do think the initial
triggering would (typically) be propagated up to the initial GBK (or
possibly even influence the Source), though not that "triggering" here
means late data handling, accumulation mode, etc. not just the trigger.
It's possible that multiple sinks could flow up to the same roots, in which
case a triggering would be chosen that could satisfy them both. As for
intermediate GBKs, I do not think the spec would require them to fire as
fast as possible. For example, if the request was to be up to date to the
hour, one could certainly imagine intermediate GBKs firing at some modest
fraction of an hour to balance latency with performance (and perhaps the
upstream triggers firing faster than once an hour as well, to compute
partial results rather than having expensive "data dumps" every 60
minutes).

The biggest shift, of course, is being a more declarative API to describe
one's intent, and the system can work out the best way to satisfy that.
Many details TBD.



>  Jan
> On 2/25/21 12:44 AM, Kenneth Knowles wrote:
>
>
>
> On Wed, Feb 24, 2021 at 12:44 AM Jan Lukavský  wrote:
>
>> Hi Robert,
>>
>> > Here "sink" is really any observable outside effect, so I think "how
>> often output should be written" and "how quickly output should react to the
>> change of input" are the same.
>>
>> The difference is in the input trigger - let's imagine, that I have two
>> chained GBKs (A and B). If I trigger A every minute, but B every second, I
>> will output 60 records per minute, but 59 of them will be the same. That's
>> why it seems to me, that meaningful "sink" triggering has to start at the
>> input and then propagate with each pane.
>>
> The idea with sink triggering is that the level of abstraction is raised.
> You have a DoFn (more generally IO transform) that writes to some external
> system, and you request updates every ten seconds. This specification is
> propagated to cause all the GBKs in the pipeline to emit data at a rate to
> enable updates to that IO every ten seconds.
>
> Sinks will need separate configurations to handle multiple panes
> (updates/retractions) vs final values, and we can validate that a sink can
> support a particular triggering strategy. Sinks already need this, but we
> haven't solved the problem very formally or systematically. In many cases,
> these are just two different sinks - for example a CSV file with an extra
> column to indicate overwrite/retraction is really a different sink than
> just appending. They write to the same storage system, but the relationship
> of the input records to the output storage differs.
>
> There's a lot of unsolved problems in terms of exactly how the triggering
> requirements of a sink can feed back to upstream aggregations to cause them
> to trigger at appropriate times. It could be static (inferring upstream
> triggering) but seems like it might have to be dynamic (running a state
> machine at the sink that broadcasts messages). I don't think this is
> straightforward, nor is it guaranteed to be doable without knobs or some
> fresh ideas.
>
> Kenn
>
>> > As an example, if I want, say, hourly output, triggering hourly at the
>> source and then as quickly as possible from then on may be wasteful. It may
>> also be desirable to arrange such that certain transforms only have a
>> single pane per window, which is easier to propagate up than down. As
>> another example, consider accumulating vs. discarding. If I have
>> CombineValues(sum) followed by a re-keying and another CombineValues(sum),
>> and I want the final output to be accumulating, the first must be
>> discarding (or, better, retractions). Propagating upwards is possible in a
>> way propagating downward is not.
>>
>> I'm not sure I understand this. If I want hourly output, I cannot trigger
>> source with lower frequency. If I trigger source with hourly, but do not
>> propagate this as fast as possible, I'm inevitably introducing additional
>> latency (that's the definition of "not as fast as possible") in 

Re: Missing copyright notices due to LICENSE change

2021-03-17 Thread Robert Burke
Looking at the history it seems that before the python text was added,
pkg.go.dev can parse the license stack just fine. It doesn't recognize the
PSF license, and fails closed entirely as a result.

I've filed an issue with pkg.go.dev (
https://github.com/golang/go/issues/45095). If the bug is fixed, the
affected versions will become visible as well.

In the meantime, we should revert my change which clobbered the other
licenses and probably cherry pick it into the affected release branches.

The PSF license is annoying as it's explicitly unique. Nothing but python
can use it and call it the PSF license. However it is a redistribution
friendly license, which is what matters.

On Wed, Mar 17, 2021, 3:00 PM Ahmet Altay  wrote:

> Thank you for this email.
>
>
> On Wed, Mar 17, 2021 at 2:32 PM Brian Hulette  wrote:
>
>> I just noticed that there was a recent change to our LICENSE file to make
>> it exactly match the Apache 2.0 License [1]. This seems to be the result of
>> two conflicting LICENSE issues.
>>
>> Go LICENSE issue: The motivation for [1] was to satisfy pkg.go.dev's
>> license policies [2]. Prior to the change our documentation didn't show up
>> there [3].
>>
>> Java artifact LICENSE issue: The removed text contained information
>> relevant to "convenience binary distributions". This text was added in [4]
>> as a result of this dev@ thread [5], where we noticed that copyright
>> notices were missing in binary artifacts. The suggested solution (that we
>> went with) was to just add the information to the root (source) LICENSE.
>>
>
> Python distribution is missing both files as well. (
> https://issues.apache.org/jira/browse/BEAM-1746)
>
>
>>
>> I'm not sure that that solution is consistent with this ASF guide [6]
>> which states:
>>
>> > The LICENSE and NOTICE files must *exactly* represent the contents of
>> the distribution they reside in. Only components and resources that are
>> actually included in a distribution have any bearing on the content of that
>> distribution's NOTICE and LICENSE.
>>
>> I would argue that *just* Apache 2.0 is the correct text for our root
>> (source) LICENSE, and the correct way to deal with binary artifacts is to
>> generate per-artifact LICENSE/NOTICE files.
>>
>
> I do not know how to interpret this ASF guide. As an example from another
> project: airflow also has a LICENSE file, NOTICE file, and a licenses
> directory. There are even overlapping mentions.
>
>
>>
>>
>> So right now the Go issue is fixed, but the Java artifact issue has
>> regressed. I can think of two potential solutions to resolve both:
>> 1) Restore the "convenience binary distributions" information, and see if
>> we can get pkg.go.dev to allow it.
>> 2) Add infrastructure to generate LICENSE and NOTICE files for Java
>> binary artifacts.
>>
>> I have no idea how we might implement (2) so (1) seems more tenable, but
>> less correct since it's adding information not relevant to the source
>> release.
>>
>> Brian
>>
>>
>> [1] https://github.com/apache/beam/pull/11657
>> [2] https://pkg.go.dev/license-policy
>> [3]
>> https://pkg.go.dev/github.com/apache/beam@v2.21.0+incompatible/sdks/go/pkg/beam
>> [4] https://github.com/apache/beam/pull/5461
>> [5]
>> https://lists.apache.org/thread.html/6ef6630e908147ee83e1f1efd4befbda43efb2a59271c5cb49473103@%3Cdev.beam.apache.org%3E
>> [6] https://infra.apache.org/licensing-howto.html
>>
>


Re: Missing copyright notices due to LICENSE change

2021-03-17 Thread Ahmet Altay
Thank you for this email.


On Wed, Mar 17, 2021 at 2:32 PM Brian Hulette  wrote:

> I just noticed that there was a recent change to our LICENSE file to make
> it exactly match the Apache 2.0 License [1]. This seems to be the result of
> two conflicting LICENSE issues.
>
> Go LICENSE issue: The motivation for [1] was to satisfy pkg.go.dev's
> license policies [2]. Prior to the change our documentation didn't show up
> there [3].
>
> Java artifact LICENSE issue: The removed text contained information
> relevant to "convenience binary distributions". This text was added in [4]
> as a result of this dev@ thread [5], where we noticed that copyright
> notices were missing in binary artifacts. The suggested solution (that we
> went with) was to just add the information to the root (source) LICENSE.
>

Python distribution is missing both files as well. (
https://issues.apache.org/jira/browse/BEAM-1746)


>
> I'm not sure that that solution is consistent with this ASF guide [6]
> which states:
>
> > The LICENSE and NOTICE files must *exactly* represent the contents of
> the distribution they reside in. Only components and resources that are
> actually included in a distribution have any bearing on the content of that
> distribution's NOTICE and LICENSE.
>
> I would argue that *just* Apache 2.0 is the correct text for our root
> (source) LICENSE, and the correct way to deal with binary artifacts is to
> generate per-artifact LICENSE/NOTICE files.
>

I do not know how to interpret this ASF guide. As an example from another
project: airflow also has a LICENSE file, NOTICE file, and a licenses
directory. There are even overlapping mentions.


>
>
> So right now the Go issue is fixed, but the Java artifact issue has
> regressed. I can think of two potential solutions to resolve both:
> 1) Restore the "convenience binary distributions" information, and see if
> we can get pkg.go.dev to allow it.
> 2) Add infrastructure to generate LICENSE and NOTICE files for Java binary
> artifacts.
>
> I have no idea how we might implement (2) so (1) seems more tenable, but
> less correct since it's adding information not relevant to the source
> release.
>
> Brian
>
>
> [1] https://github.com/apache/beam/pull/11657
> [2] https://pkg.go.dev/license-policy
> [3]
> https://pkg.go.dev/github.com/apache/beam@v2.21.0+incompatible/sdks/go/pkg/beam
> [4] https://github.com/apache/beam/pull/5461
> [5]
> https://lists.apache.org/thread.html/6ef6630e908147ee83e1f1efd4befbda43efb2a59271c5cb49473103@%3Cdev.beam.apache.org%3E
> [6] https://infra.apache.org/licensing-howto.html
>


Missing copyright notices due to LICENSE change

2021-03-17 Thread Brian Hulette
I just noticed that there was a recent change to our LICENSE file to make
it exactly match the Apache 2.0 License [1]. This seems to be the result of
two conflicting LICENSE issues.

Go LICENSE issue: The motivation for [1] was to satisfy pkg.go.dev's
license policies [2]. Prior to the change our documentation didn't show up
there [3].

Java artifact LICENSE issue: The removed text contained information
relevant to "convenience binary distributions". This text was added in [4]
as a result of this dev@ thread [5], where we noticed that copyright
notices were missing in binary artifacts. The suggested solution (that we
went with) was to just add the information to the root (source) LICENSE.

I'm not sure that that solution is consistent with this ASF guide [6] which
states:

> The LICENSE and NOTICE files must *exactly* represent the contents of the
distribution they reside in. Only components and resources that are
actually included in a distribution have any bearing on the content of that
distribution's NOTICE and LICENSE.

I would argue that *just* Apache 2.0 is the correct text for our root
(source) LICENSE, and the correct way to deal with binary artifacts is to
generate per-artifact LICENSE/NOTICE files.


So right now the Go issue is fixed, but the Java artifact issue has
regressed. I can think of two potential solutions to resolve both:
1) Restore the "convenience binary distributions" information, and see if
we can get pkg.go.dev to allow it.
2) Add infrastructure to generate LICENSE and NOTICE files for Java binary
artifacts.

I have no idea how we might implement (2) so (1) seems more tenable, but
less correct since it's adding information not relevant to the source
release.

Brian


[1] https://github.com/apache/beam/pull/11657
[2] https://pkg.go.dev/license-policy
[3]
https://pkg.go.dev/github.com/apache/beam@v2.21.0+incompatible/sdks/go/pkg/beam
[4] https://github.com/apache/beam/pull/5461
[5]
https://lists.apache.org/thread.html/6ef6630e908147ee83e1f1efd4befbda43efb2a59271c5cb49473103@%3Cdev.beam.apache.org%3E
[6] https://infra.apache.org/licensing-howto.html


Beam Website Feedback

2021-03-17 Thread ml
Ambigous language. A lot of missing parts. I think a programming guide 
must be simpler and with much more details and illustrations




Re: BEAM-11023: tests failing on Spark Structured Streaming runner

2021-03-17 Thread Ismaël Mejía
Actually there are many reasons that could have produced this
regression even if the code of the runner has not changed at all: (1)
those tests weren't enabled before and now are and they weren't
passing or (2) the tests were changed or (3) my principal guess: the
translation strategy of a runners-core library changed and as a side
effect the tests fail in the runner, maybe the SDF/use_deprecated_read
changes.


On Wed, Mar 17, 2021 at 4:44 PM Brian Hulette  wrote:
>
> You can look through the history of the PostCommit [1]. We only keep a couple 
> weeks of history, but it looks like we have one successful run from Sept 10, 
> 2020, marked as "keep forever", that ran on commit 
> 57055262e7a6bff447eef2df1e6efcda754939ca. Is that what you're looking for?
>
> (Somewhat related, I was under the impression that Jenkins always kept the 
> before/after runs around the last state change, but that doesn't seem to be 
> the case as the first failure we have is [3])
>
> Brian
>
> [1] 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/
> [2] 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/2049/
> [3] 
> https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/2098/
>
> On Tue, Mar 16, 2021 at 4:36 PM Fernando Morales Martinez 
>  wrote:
>>
>> Hi team,
>> it is mentioned in this WI that the tests (GroupByKeyTest testLargeKeys100MB 
>> and testGroupByKeyWithBadEqualsHashCode) stopped working around five months 
>> ago.
>> I took a look at the PRs prior to that date and couldn't find a report 
>> stating that they were working.
>>
>> Is there a way to get reports from before June 2020 (the farthest back I was 
>> able to navigate) so I can compare the tests succeeding against them failing?
>>
>> Thanks a lot!
>> - Fernando Morales
>>
>> This email and its contents (including any attachments) are being sent to
>> you on the condition of confidentiality and may be protected by legal
>> privilege. Access to this email by anyone other than the intended recipient
>> is unauthorized. If you are not the intended recipient, please immediately
>> notify the sender by replying to this message and delete the material
>> immediately from your system. Any further use, dissemination, distribution
>> or reproduction of this email is strictly prohibited. Further, no
>> representation is made with respect to any content contained in this email.


Re: Adding flaky postcommit test suite

2021-03-17 Thread Kenneth Knowles
That's pretty awesome. And what nice documentation!

Clicking through https://github.com/apache/airflow/issues/10118 and
https://github.com/apache/airflow/pull/10768 it looks like the actual
quarantining / unquarantining is manual, yes? So we could reach this level
with JUnit categories for Java anyhow. We would just want a good way to get
test-level history to review, which I think the Jenkins Build History
plugin now gives us. It would be great to have automation to let us know
when a test becomes stable or flaky.

Kenn

On Tue, Mar 16, 2021 at 5:29 PM Tyson Hamilton  wrote:

> The Apache Airflow project has some interesting automation around flaky
> tests. They annotate such flaky tests as 'quarantined', those quarantined
> tests still run (maybe even with retries?) but won't fail a test suite.
> Quarantined tests are run in a separate scheduled job, when they start
> passing, they are no longer quarantined. Github issues are updated with the
> status.
>
> [1]:
> https://github.com/apache/airflow/blob/master/CI.rst#scheduled-quarantined-builds
>
> On Tue, Mar 16, 2021 at 4:06 PM Kenneth Knowles  wrote:
>
>> I expect the suite to be permared, right? Because of some thing or
>> another flaking at all times.
>>
>> Kenn
>>
>> On Tue, Mar 16, 2021 at 2:13 PM Alex Amato  wrote:
>>
>>> Is it possible to make the presubmit auto retry all failed tests a few
>>> times? (and maybe generate a report of a list of flakey tests).
>>> Then you don't need to disable/isolate the flakey tests.
>>>
>>> If this is not possible, or hard to setup, then manually moving them to
>>> a different suite sounds like a good idea.
>>>
>>> On Tue, Mar 16, 2021 at 2:11 PM Pablo Estrada 
>>> wrote:
>>>
 Hi all,
 In Beam, we sometimes hit the issue of having one or two test cases
 that are particularly flaky, and we deactivate them.
 This is completely reasonable to me, because we need to keep good
 testing signal on our primary suites.
 The danger of deactivating these tests is that, although we have good
 practices to file JIRA issues to re-enable them, it is still easy for these
 issues and tests to be forgotten.
 Of course, ideally, the solution is "do not forget old deactivated
 tests" - and we should adopt practices to ensure that.

 I think, to strengthen our practices, we can reinforce them with a
 pragmatic choice: Instead of fully deactivating tests, we can make them run
 in a separate suite of Flaky tests. Why would this help?

 - It would allow us to make sure that flaky tests continue to *be able
 to run*.
 - It would remind us that we have flaky tests that need fixing.
 - It would allow us to experiment fixes to these tests on the Flaky
 suite, and once they're reliable, move them to the main suite.

 Does this make sense to others?
 Best
 -P.

>>>


Re: Null checking in Beam

2021-03-17 Thread Kenneth Knowles
On Wed, Mar 17, 2021 at 2:49 AM Jan Lukavský  wrote:

> If there is no way to configure which annotations should be generated,
> then I'd be +1 for removing the checker to separated CI and adding an
> opt-in flag for the check when run locally.
>
Yes. If this answer is correct, then we are out of luck:
https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods

A good followup to moving it to a separate CI job would be to attach a full
type checking run to the `:check` gradle task. This should be the best
place to attach all of our extended checks like spotbugs, spotless,
checkstyle, checkerframework. I rarely run `:check` myself (I think it is
currently broken in some ways) but it may be good to start.

BTW the slowdown we need to solve is local builds, not CI runs. When it was
added the slowdown was almost completely prevented by caching. And now that
we disable it for test classes (where for some reason it was extra slow) I
wonder if it will speed up the main CI runs at all. So I would say the
first thing to do is to just disable it by default but enable it in the
Jenkins job builders.

Kenn

We need to solve the issue for dev@ as well, as the undesirable annotations
> are already digging their way to the code base:
>
> git/apache/beam$ git grep UnknownKeyFor
>
> Another strange thing is that it seems, that we are pulling the
> checkerframework as a runtime dependency (at least for some submodules).
> When I run `mvn dependency:tree` on one of my projects that uses maven I see
>
> [INFO] +- com.google.guava:guava:jar:30.1-jre:provided
> [INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
> [INFO] |  +-
> com.google.guava:listenablefuture:jar:.0-empty-to-avoid-conflict-with-guava:provided
> [INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided
>
> which is fine, but when I add beam-sdks-java-extensions-euphoria it
> changes to
>
> [INFO] +-
> org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
> [INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile
>
> I'm not a gradle guru, so I cannot tell how this happens, there seems to
> be nothing special about euphoria in the gradle.
>
>  Jan
> On 3/16/21 7:12 PM, Kenneth Knowles wrote:
>
> I've asked on checkerframework users mailing list if they or any users
> have recommendations for the IntelliJ integration issue.
>
> It is worth noting that the default annotations inserted into the bytecode
> do add value: the @NonNull type annotations are default for
> checkerframework but not default for spotbugs. So having the default
> inserted enables downstream users to have betters spotbugs heuristic
> analysis. Further, the defaults can be adjusted by users so freezing them
> at the values we use them at is valuable in general across all tools.
>
> I think it makes sense to sacrifice these minor value-adds to keep things
> simple-looking for IntelliJ users.
>
> Kenn
>
> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles  wrote:
>
>> Seems it is an FAQ with no solution:
>> https://checkerframework.org/manual/#faq-classfile-annotations
>>
>> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles  wrote:
>>
>>> Adding -PskipCheckerframework when releasing will solve it for users,
>>> but not for dev@.
>>>
>>> Making it off by default and a separate CI check that turns it on would
>>> solve it overall but has the downsides mentioned before.
>>>
>>> It would be very nice to simply have a flag to not insert default
>>> annotations.
>>>
>>> Kenn
>>>
>>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský  wrote:
>>>
 I believe it is not a problem of Idea. At least I didn't notice
 behavior like that with Guava, although Guava uses the framework as well.
 Maybe there is a way to tune which annotations should be generated? Or
 Guava handles that somehow differently?
 On 3/16/21 5:22 PM, Reuven Lax wrote:

 I've also been annoyed at IntelliJ autogenenerating all these
 annotations. I believe Kenn said that this was not the intention - maybe
 there's an IntelliJ setting that would stop this from happening?

 On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský  wrote:

> I don't know the details of the checkerframework, but there seems a
> contradiction between what code is (currently) generated and what we
> therefore release and what actually the checkerframework states [1]:
>
> @UnknownKeyFor:
>
> Used internally by the type system; should never be written by a
> programmer.
>
> If this annotation is generated for overwritten methods, then I'd say,
> that it means we place a great burden to our users - either not using
> autogenerated methods, or erase all the generated annotations afterwards.
> Either way, that is not "polite" from Beam.
>
> What we should judge is not only a formal purity of code, but what
> stands on the other side is how usable APIs we provide. We should not head
> for 100% 

Re: BEAM-11023: tests failing on Spark Structured Streaming runner

2021-03-17 Thread Brian Hulette
You can look through the history of the PostCommit [1]. We only keep a
couple weeks of history, but it looks like we have one successful run from
Sept 10, 2020, marked as "keep forever", that ran on commit
57055262e7a6bff447eef2df1e6efcda754939ca.
Is that what you're looking for?

(Somewhat related, I was under the impression that Jenkins always kept the
before/after runs around the last state change, but that doesn't seem to be
the case as the first failure we have is [3])

Brian

[1]
https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/
[2]
https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/2049/
[3]
https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/2098/

On Tue, Mar 16, 2021 at 4:36 PM Fernando Morales Martinez <
fernando.mora...@wizeline.com> wrote:

> Hi team,
> it is mentioned in this WI that the tests (GroupByKeyTest
> testLargeKeys100MB and testGroupByKeyWithBadEqualsHashCode) stopped working
> around five months ago.
> I took a look at the PRs prior to that date and couldn't find a report
> stating that they were working.
>
> Is there a way to get reports from before June 2020 (the farthest back I
> was able to navigate) so I can compare the tests succeeding against them
> failing?
>
> Thanks a lot!
> - Fernando Morales
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*


Re: Access to Beam contributor list

2021-03-17 Thread Alexey Romanenko
Done.

Welcome to Beam!

Alexey

> On 16 Mar 2021, at 19:59, César Cueva Orozco  wrote:
> 
> Hello Team,
> 
> Could you please add my user name CesarCueva19 to the contributor list?
> 
> Thank you
> 
> This email and its contents (including any attachments) are being sent to
> you on the condition of confidentiality and may be protected by legal
> privilege. Access to this email by anyone other than the intended recipient
> is unauthorized. If you are not the intended recipient, please immediately
> notify the sender by replying to this message and delete the material
> immediately from your system. Any further use, dissemination, distribution
> or reproduction of this email is strictly prohibited. Further, no
> representation is made with respect to any content contained in this email.



Re: Null checking in Beam

2021-03-17 Thread Jan Lukavský
If there is no way to configure which annotations should be generated, 
then I'd be +1 for removing the checker to separated CI and adding an 
opt-in flag for the check when run locally.


We need to solve the issue for dev@ as well, as the undesirable 
annotations are already digging their way to the code base:


git/apache/beam$ git grep UnknownKeyFor

Another strange thing is that it seems, that we are pulling the 
checkerframework as a runtime dependency (at least for some submodules). 
When I run `mvn dependency:tree` on one of my projects that uses maven I see


[INFO] +- com.google.guava:guava:jar:30.1-jre:provided
[INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
[INFO] |  +- 
com.google.guava:listenablefuture:jar:.0-empty-to-avoid-conflict-with-guava:provided

[INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided

which is fine, but when I add beam-sdks-java-extensions-euphoria it 
changes to


[INFO] +- 
org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile

[INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile

I'm not a gradle guru, so I cannot tell how this happens, there seems to 
be nothing special about euphoria in the gradle.


 Jan

On 3/16/21 7:12 PM, Kenneth Knowles wrote:
I've asked on checkerframework users mailing list if they or any users 
have recommendations for the IntelliJ integration issue.


It is worth noting that the default annotations inserted into the 
bytecode do add value: the @NonNull type annotations are default for 
checkerframework but not default for spotbugs. So having the default 
inserted enables downstream users to have betters spotbugs heuristic 
analysis. Further, the defaults can be adjusted by users so freezing 
them at the values we use them at is valuable in general across all tools.


I think it makes sense to sacrifice these minor value-adds to keep 
things simple-looking for IntelliJ users.


Kenn

On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles > wrote:


Seems it is an FAQ with no solution:
https://checkerframework.org/manual/#faq-classfile-annotations


On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles mailto:k...@apache.org>> wrote:

Adding -PskipCheckerframework when releasing will solve it for
users, but not for dev@.

Making it off by default and a separate CI check that turns it
on would solve it overall but has the downsides mentioned before.

It would be very nice to simply have a flag to not insert
default annotations.

Kenn

On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský mailto:je...@seznam.cz>> wrote:

I believe it is not a problem of Idea. At least I didn't
notice behavior like that with Guava, although Guava uses
the framework as well. Maybe there is a way to tune which
annotations should be generated? Or Guava handles that
somehow differently?

On 3/16/21 5:22 PM, Reuven Lax wrote:

I've also been annoyed at IntelliJ autogenenerating all
these annotations. I believe Kenn said that this was not
the intention - maybe there's an IntelliJ setting that
would stop this from happening?

On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský
mailto:je...@seznam.cz>> wrote:

I don't know the details of the checkerframework, but
there seems a contradiction between what code is
(currently) generated and what we therefore release
and what actually the checkerframework states [1]:

@UnknownKeyFor:

Used internally by the type system; should never be
written by a programmer.

If this annotation is generated for overwritten
methods, then I'd say, that it means we place a great
burden to our users - either not using autogenerated
methods, or erase all the generated annotations
afterwards. Either way, that is not "polite" from Beam.

What we should judge is not only a formal purity of
code, but what stands on the other side is how usable
APIs we provide. We should not head for 100% pure
code and sacrificing use comfort. Every check that
leaks to user code is at a price and we should not
ignore that. No free lunch.

From my point of view:

 a) if a check does not modify the bytecode, it is
fine and we can use it - we are absolutely free to
use any tooling we agree on, if our users cannot be
affected anyhow

 b) if a tool needs to be leaked to user, it should
be as small leakage as possible

 c)