[VOTE] Release 2.23.0, release candidate #1

2020-07-09 Thread Valentyn Tymofieiev
Hi everyone,

Please review and vote on the release candidate #1 for the version 2.23.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint 1DF50603225D29A4 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.23.0-RС1" [5],
* website pull request listing the release [6], publishing the API
reference manual [7], and the blog post [8].
* Java artifacts were built with Maven 3.6.0 and Oracle JDK 1.8.0_201-b09 .
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.23.0 release to help with validation
[9].
* Docker images published to Docker Hub [10].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347145
[2] https://dist.apache.org/repos/dist/dev/beam/2.23.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1105/
[5] https://github.com/apache/beam/tree/v2.23.0-RC1
[6] https://github.com/apache/beam/pull/12212
[7] https://github.com/apache/beam-site/pull/605
[8] https://github.com/apache/beam/pull/12213
[9]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=596347973
[10] https://hub.docker.com/search?q=apache%2Fbeam&type=image


Season of Docs 2020 Proposal for Apache Beam (Sruthi Sree Kumar)

2020-07-09 Thread Season of Docs
Below is a project proposal from a technical writer (bcc'd) who wants to
work with your organization on a Season of Docs project. Please assess the
proposal and ensure that you have a mentor to work with the technical
writer.

If you want to accept the proposal, please submit the technical writing
project to the Season of Docs program administrators. The project selection
form is at this link: . The form
is also available in the guide for organization administrators
.


The deadline for project selections is July 31, 2020 at 20:00 UTC. For
other program deadlines, please see the full timeline
 on the Season
of Docs website.

If you have any questions about the program, please email the Season of
Docs team at season-of-docs-supp...@googlegroups.com.

Best,
The Google Season of Docs team


Title: Update of the runner comparison page / capability matrix Project
length: Standard length (3 months)
Writer information *Name:* Sruthi Sree Kumar
*Email:* sruthiskuma...@gmail.com
*Résumé/CV:*
https://drive.google.com/file/d/12RtM7Obz2Fog-AcIJAX1kLCKPPytY2Hq/view?usp=sharing
*Sample:* https://medium.com/big-data-processing
*Additional information:* I, Sruthi Sree Kumar, is a dual degree master
student in Cloud Computing and services. Currently, I am writing my master
thesis on Apache Flink state management API with Continuous Deep Analytics
research group at Research Institute of Sweden(RISE). Before my masters, I
have 4 years of work experience as a backend developer. I would like to
participate in the season of docs since I have found projects that are
related to my current work, area of interest and future career path.
Currently, I have been an active user of open source projects such as
Apache Beam and Apache Flink. Having said that, I also started a technical
blog earlier this year which has contents focussing on algorithms/concepts
in distributed systems and distributed processing systems.
Project Description Apache Beam is a unified platform for defining both
batch and stream processing pipelines. Apache Beam lets you define a model
to represent and transform datasets irrespective of any specific data
processing platform. Once defined, you can run it on any of the supported
run-time frameworks (runners) which includes Apache Apex, Apache Flink,
Apache Spark, and Google Cloud Dataflow. Apache Beam also comes with
different SDK’s which let you write your pipeline in programming languages
such as Java, python and GO.

I am submitting my application for the GSOD on “Update of the runner
comparison page/capability matrix”. As Apache Beam supports multiple
runners and SDK, a new user will be confused to choose between them. The
current documentation of different runners gives a very brief overview of
the runner. My idea is to add more comprehend details of each runner on the
particular runner documentation page. Also, I would like to update the
description of the example word count project to add a detailed
explanation. For this, my plan is to try every word count example locally
in my machine and find out if some steps are missing and add more
explanation on the process. Another thing which I have noticed is that the
documentation for the runners does not follow any pattern(Few has got an
overview section while others start with how to use or the prerequisite or
some random title). I will update all of them to follow a single simple
pattern.

I plan to add a new page to describe each runner and provide a descriptive
narration to each of them[BEAM-3220]. From this page, users can redirect to
the detailed description page of each runner and the capability matrix. I
also plan to add a descriptive comparison of each runner here. Currently, I
am using Beam NEXMark for benchmarking Flink runners for my master thesis.
As I am completely aware of NEXMark benchmarking, I would like to include
the benchmarking results of each runner in both batch and streaming mode
here(BEAM-2944). I would also update the NEXMark documentation if I find
out any parameters/ configuration are missing/removed. Before when I was
using Flink runner I was stuck initially as one of the parameters was
missing in the documentation [
https://lists.apache.org/thread.html/re71e8298e0c13180a4ab0ac6a65e808e1d82ce85e955778cf1089553%40%3Cuser.beam.apache.org%3E].
But now as I am more familiar with the NEXMark code base as well it would
be easier for me to benchmark the runners and add the metrics. In this same
page, I would like to include a brief summary of the production readiness
of each runner.

In the current documentation, the support for classic/portable runner is
included in each runner description page. I think it's also better to bring
them all at one place, either in the capability matrix or in the newly
added description page. Also, currently, the portabi

Season of Docs 2020 Proposal for Apache Beam (Ayeshmantha)

2020-07-09 Thread Season of Docs
Below is a project proposal from a technical writer (bcc'd) who wants to
work with your organization on a Season of Docs project. Please assess the
proposal and ensure that you have a mentor to work with the technical
writer.

If you want to accept the proposal, please submit the technical writing
project to the Season of Docs program administrators. The project selection
form is at this link: . The form
is also available in the guide for organization administrators
.


The deadline for project selections is July 31, 2020 at 20:00 UTC. For
other program deadlines, please see the full timeline
 on the Season
of Docs website.

If you have any questions about the program, please email the Season of
Docs team at season-of-docs-supp...@googlegroups.com.

Best,
The Google Season of Docs team


Title: Deployment of a Flink and Spark Clusters with Portable Beam Project
length: Standard length (3 months)
Writer information *Name:* Ayeshmantha
*Email:* aka.perer...@gmail.com
*Résumé/CV:*
https://drive.google.com/file/d/11LBRBl2f1zpsBPytXv5530zUPjBSIZTD/view?usp=sharing


Writing experience: Experience 1:
*Title:* Google Season Of Docs
*Date:* 2019- Last season
*Description:* The primary objective of the project is to develop an
interactive documentation where end users can interact with API’s with try
out options and in the meantime having more descriptive and self explained
description which explains the technical and nontechnical people on what
situation one should use the endpoint.

The current swagger environment is great but for non techies and techies
still it’s harder to get an idea directly without proper documentation. The
main idea would be getting this swagger environment and documentation in
one place with nice representation. With the help of JS, HTML and CSS.
*Summary:* https://medium.com/@ayeshmanthaperera/gsod19-openmrs-4259aa6356f1
https://rest.openmrs.org/

*Sample:* https://rest.openmrs.org/
*Additional information:* https://akayeshmantha.github.io/
Project Description The Apache Beam vision has been to provide a framework
for users to write and execute pipelines on the programming language of
your choice, and the runner of your choice. As the reality of Beam has
evolved towards this vision, the way in which Beam is run on top of runners
such as Apache Spark and Apache Flink has changed.

These changes are documented in the wiki and in design documents, and are
accessible for Beam contributors; but they are not available in the
user-facing documentation. This has been a barrier of adoption for other
users of Beam.

Project deliverables
This project involves improving the Flink Runner page to include strategies
to deploy Beam on a few different environments: A Kubernetes cluster, a
Google Cloud Dataproc cluster, and an AWS EMR cluster. There are other
places in the documentation that should be updated with new documentation
in this regard.

After working on the Flink Runner, then similar updates should be made to
the Spark Runner page[2], and the getting started documentation[3].

Project Background Materials
There currently are a number of users of the Apache Beam portable runners,
and there have been a few different writeups, and attempts at summarizing
how to easily run Apache Beam on the different runners. Some examples are:

A guide for running Beam on Flink with the k8s operator developed by
Google[6]
A few issues from users asking about running Beam in one of these runners
BEAM-8970 - Spark portable runner supports Yarn OPEN
Furthermore, valuable reading materials regarding configurations for SDK
workers, and Docker containers is in the following resources: {{EXTRA16}}
{{EXTRA17}}


Season of Docs 2020 Proposal for Apache Beam (Basavraj)

2020-07-09 Thread Season of Docs
Below is a project proposal from a technical writer (bcc'd) who wants to
work with your organization on a Season of Docs project. Please assess the
proposal and ensure that you have a mentor to work with the technical
writer.

If you want to accept the proposal, please submit the technical writing
project to the Season of Docs program administrators. The project selection
form is at this link: . The form
is also available in the guide for organization administrators
.


The deadline for project selections is July 31, 2020 at 20:00 UTC. For
other program deadlines, please see the full timeline
 on the Season
of Docs website.

If you have any questions about the program, please email the Season of
Docs team at season-of-docs-supp...@googlegroups.com.

Best,
The Google Season of Docs team


Title: Deployment of a Flink and Spark Clusters with Portable Beam Project
length: Standard length (3 months)
Writer information *Name:* Basavraj
*Email:* 1by17me...@bmsit.in
Project Description To provide a framework for users to write and execute
pipelines on the programming language of your choice, and the runner of
your choice. As the reality of Beam has evolved towards this vision, the
way in which Beam is run on top of runners such as Apache Spark and Apache
Flink has changed.

These changes are documented in the wiki and in design documents, and are
accessible for Beam contributors; but they are not available in the
user-facing documentation. This has been a barrier of adoption for other
users of Beam. {{EXTRA16}} {{EXTRA17}}


Re: Streaming pipeline "most-recent" join

2020-07-09 Thread Reza Rokni
Hya,

I never got a chance to finish this one, maybe I will get some time in the
summer break... but I think it will help with your use case...

https://github.com/rezarokni/beam/blob/BEAM-7386/sdks/java/extensions/timeseries/src/main/java/org/apache/beam/sdk/extensions/timeseries/joins/BiTemporalStreams.java

Cheers
Reza

On Fri, Jul 10, 2020 at 8:58 AM Harrison Green 
wrote:

> Hi Beam devs,
>
> I'm working on a streaming pipeline where we need to do a "most-recent"
> join between two PCollections. Specifically, something like:
>
> out = pcoll1 | beam.Map(lambda a,b: (a,b),
> b=beam.pvalue.AsSingleton(pcoll2))
>
> The goal is to join each value in pcoll1 with only the most recent value
> from pcoll2. (in this case pcoll2 is much more sparse than pcoll1)
> ---
> altay@ suggested using a global window for the side-input pcollection
> with a trigger on each element. I've been trying to simulate this behavior
> locally with beam.testing.TestStream but I've been running into some issues.
>
> Specifically, the Repeatedly(AfterCount(1)) trigger seems to work
> correctly, but the side input receives too many panes (even when using
> discarding accumulation). I've set up a minimal demo here:
> https://colab.research.google.com/drive/1K0EqcKWxa4UK3SrkLBeHs7HSynw_VfSZ?usp=sharing
> In this example, I'm trying to join values from pcollection "a" with
> pcollection "b". However each pane of pcollection "a" is able to "see" all
> of the panes from pcollection "b" which is not what I would expect.
>
> I am curious if anyone has advice for how to handle this type of problem
> or an alternative solution for the "most-recent" join. (side note: I was
> able to hack together an alternative solution that uses a custom
> window/windowing strategy but it was fairly complex and I think a strategy
> that uses GlobalWindows would be preferred).
>
> Sincerely,
> Harrison
>


Streaming pipeline "most-recent" join

2020-07-09 Thread Harrison Green
Hi Beam devs,

I'm working on a streaming pipeline where we need to do a "most-recent"
join between two PCollections. Specifically, something like:

out = pcoll1 | beam.Map(lambda a,b: (a,b),
b=beam.pvalue.AsSingleton(pcoll2))

The goal is to join each value in pcoll1 with only the most recent value
from pcoll2. (in this case pcoll2 is much more sparse than pcoll1)
---
altay@ suggested using a global window for the side-input pcollection with
a trigger on each element. I've been trying to simulate this behavior
locally with beam.testing.TestStream but I've been running into some issues.

Specifically, the Repeatedly(AfterCount(1)) trigger seems to work
correctly, but the side input receives too many panes (even when using
discarding accumulation). I've set up a minimal demo here:
https://colab.research.google.com/drive/1K0EqcKWxa4UK3SrkLBeHs7HSynw_VfSZ?usp=sharing
In this example, I'm trying to join values from pcollection "a" with
pcollection "b". However each pane of pcollection "a" is able to "see" all
of the panes from pcollection "b" which is not what I would expect.

I am curious if anyone has advice for how to handle this type of problem or
an alternative solution for the "most-recent" join. (side note: I was able
to hack together an alternative solution that uses a custom
window/windowing strategy but it was fairly complex and I think a strategy
that uses GlobalWindows would be preferred).

Sincerely,
Harrison


Re: Versioning published Java containers

2020-07-09 Thread Kyle Weaver
My main question is, are we confident the Java 11 container is ready to
release? AFAIK there are still a number of issues blocking full Java 11
support (cf [1] ; not
sure how many of these, if any, affect the SDK harness specifically though.)

For comparison, we recently decided to stop publishing Go SDK containers
until the Go SDK is considered mature [2]. In the meantime, those who want
to use the Go SDK can build their own container images from source.

Do we already have a Gradle task to build Java 11 containers? If not, this
would be a good intermediate step, letting users opt-in to Java 11 without
us overpromising support.

When we eventually do the renaming, we can add a note to CHANGES.md [3].

[1] https://issues.apache.org/jira/browse/BEAM-10090
[2] https://issues.apache.org/jira/browse/BEAM-9685
[3] https://github.com/apache/beam/blob/master/CHANGES.md

On Thu, Jul 9, 2020 at 3:44 PM Emily Ye  wrote:

> Hi all,
>
> I'm getting ramped up on contributing and was looking into adding the Java
> 11 harness container to releases (
> https://issues.apache.org/jira/browse/BEAM-8106) - should I rename the
> current java container so we have two new images `beam_java8_sdk` and
> `beam_java11_sdk` or hold off on renaming? If we do rename it, what steps
> should I take to announce/document the change?
>
> Thanks,
> Emily
>


Interactive Beam Side Panel in JupyterLab

2020-07-09 Thread Ning Kang
Hi everyone,

Here is a design doc

about adding a JupyterLab extension for Interactive Beam

.


   - Interactive Beam provides an InteractiveRunner that supports a series
   of features to interactively explore the pipeline states and PCollection
   data when creating and executing Beam pipelines in REPL-like notebook
   environments.


   - JupyterLab  is the open-source notebook
   runtime created by Project Jupyter  born out
   of IPython (It's also known as Jupyter Notebook in the pre-lab era).


   - A JupyterLab extension (labextension) is a node package that could be
   installed in JupyterLab to support customized integration with the notebook
   runtime itself.

The proposed extension provides a new way for users to interact with Beam
pipelines and PCollections in JupyterLab from a customized UI instead of
writing/running/removing/re-arranging non-core-Beam code to be executed in
kernels.

It also lays the foundation for building/integrating/presenting useful Beam
features in notebook environments with customizable UX.

Since the package will later be published to NPM ,
I also want to collect your opinions about the package naming.

Here are some proposed names:

   - apache-beam-sidepanel
   - apache-beam-interactive-sidepanel
   - apache-beam-jupyterlab-sidepanel


Please feel free to comment in the doc and leave your naming preference.
We'll also start a vote later to finalize the name.

Thanks!

Ning.


Versioning published Java containers

2020-07-09 Thread Emily Ye
Hi all,

I'm getting ramped up on contributing and was looking into adding the Java
11 harness container to releases (
https://issues.apache.org/jira/browse/BEAM-8106) - should I rename the
current java container so we have two new images `beam_java8_sdk` and
`beam_java11_sdk` or hold off on renaming? If we do rename it, what steps
should I take to announce/document the change?

Thanks,
Emily


[Proposal] - Publish Content for Apache Beam Channels

2020-07-09 Thread Brittany Hermann
Hi folks,



I wanted to share some exciting news that Google is starting to leverage
Hootsuite and Brandwatch as social listening and engagement tools to
streamline engagement and measure impact of content produced by open source
projects. We will do this to listen to our user and contributor
communities, identify use cases, gather feature requests, and engage with
people who need support.



Proposal to PMC

Connect Apache Beam’s social media accounts to Hootsuite and Brandwatch to
produce content and monitor impact. This request would enable us to execute
on the communication strategy that Maria Cruzwe has developed for Apache
Beam [1
].



To connect Apache Beam’s social media accounts, I’d like to ask the PMC if
I could get access to the Apache Beam social media accounts to connect them
to the platform and produce content for the project.



By having access to the Apache Beam channels, I could help monitor and
answer questions, capture and send questions over to the mailing list, and
engage with the community. I could also commit to share with our community
a monthly report of Beam’s brand value, the feature requests and bugs
identified, as well as the impact of the content produced and published
through the platform.



I would love to hear your thoughts!


--



Additional Context



What is Hootsuite?

Hootsuite is a social media management system that is used to help track
and manage social network channels. It enables companies to monitor what
people are saying about the brand and help teams respond instantly.



What is Brandwatch?

Brandwatch is a social listening platform that is used to get access to the
world’s largest library of consumer registration. Brandwatch helps
companies understand positioning within the market by accessing data from
100 million sources and over 1.3 trillion posts.



Why are we using a social listening tool?

We want to listen to our user and contributor communities to identify areas
where we can support and engage them.



What are the benefits of using Hootsuite and Brandwatch?

   -

   Understand Apache Beam’s positioning in the Streaming Analytics Market
   -

   Increase brand recognition through SEO optimized content
   -

   Engage with new users
   -

   Help and support users who need help



What social media platforms will we be listening to?

The social media platforms that we will monitor are: Twitter, LinkedIn,
Reddit, Medium, Stack Overflow, GitHub, Jira, and Pony Mail.

[1] https://cwiki.apache.org/confluence/display/BEAM/Communication+strategy


-- 

Brittany Hermann

Open Source Program Manager (Provided by Adecco Staffing)

1190 Bordeaux Drive , Building 4, Sunnyvale, CA 94089



Re: Season of Docs Interest

2020-07-09 Thread Aizhamal Nurmamat kyzy
Hey Sharon!

Thank you so much for your interest to contribute to Beam's documentation.
It is a big help that you have knowledge and experience with Spark and
Dataflow already.

In order to be considered for the program, you'll need to submit a proposal
with a summary of the documentation work that you would like to complete
while working with us on Apache Beam. I am adding +Pablo Estrada
 , assigned mentor for this project idea, if you have
specific questions regarding the capability matrix. You can also work with
me or Pablo if you want any feedback for your initial draft. For that
consider using Google Docs and share with us.

Also check out guides for technical writers provided by GSoD team [1]. It
has many tips on how to make your proposal to stand out.

Hope it helps, and let me know if you have any questions.
Aizhamal

[1]
https://developers.google.com/season-of-docs/docs/tech-writer-application-hints


On Wed, Jul 8, 2020 at 12:34 PM Sharon Lin  wrote:

> Hi Aizhamal,
>
> I'm a 4th year bachelors student at MIT studying computer science, and I'm
> interested in working with Apache Beam for Season of Docs! I recognize that
> it's close to the application deadline, but I'm an avid user of Apache
> Spark and would really love to help with documenting tools for developers.
>
> I'm interested in working on the update of the runner comparison page /
> capability matrix. I've set up Spark and Dataflow before, and I believe I
> have the necessary background to get started on the deliverables once the
> program begins.
>
> I've attached my resume if that's helpful. Thanks, and I hope to work with
> you!
>
> Best,
> Sharon Lin
> Department of EECS
> Massachusetts Institute of Technology
>
>


Re: Finer-grained test runs?

2020-07-09 Thread Robert Bradshaw
It does sound like we're generally on the same page. Minor comments below.

On Thu, Jul 9, 2020 at 1:00 PM Kenneth Knowles  wrote:
>
> On Thu, Jul 9, 2020 at 11:47 AM Robert Bradshaw  wrote:
>>
>> On Thu, Jul 9, 2020 at 8:40 AM Luke Cwik  wrote:
>> >
>> >> If Brian's: it does not result in redundant build (if plugin works) since 
>> >> it would be one Gradle build process. But it does do a full build if you 
>> >> touch something at the root of the ancestry tree like core SDK or model. 
>> >> I would like to avoid automatically testing descendants if we can, since 
>> >> things like Nexmark and most IOs are not sensitive to the vast majority 
>> >> of model or core SDK changes. Runners are borderline.
>> >
>> > I believe that the cost of fixing an issue that is found later once the 
>> > test starts failing because the test wasn't run as part of the PR has a 
>> > much higher order of magnitude of cost to triage and fix. Mostly due to 
>> > loss of context from the PR author/reviewer and if the culprit PR can't be 
>> > found then whoever is trying to fix it.
>>
>> Huge +1 to this.
>
>
> Totally agree. This abstract statement is clearly true. I suggest considering 
> things more concretely.
>
>> Ideally we could count on the build system (and good caching) to only
>> test what actually needs to be tested, and with work being done on
>> runners and IOs this would be a small subset of our entire suite. When
>> working lower in the stack (and I am prone to do) I think it's
>> acceptable to have longer wait times--and would *much* rather pay that
>> price than discover things later. Perhaps some things could be
>> surgically removed (it would be interesting to mine data on how often
>> test failures in the "leaves" catch real issues), but I would do that
>> with care. That being said, flakiness is really an issues (and it
>> seems these days I have to re-run tests, often multiple times, to get
>> a PR to green; splitting up jobs could help that as well).
>
> Agree with your sentiment that a longer wait for core changes is generally 
> fine; my phrasing above overemphasized this case. Anecdotally, without mining 
> data, leaf modules do catch bugs in core changes sometimes when (by 
> definition) they are not adequately tested. This is a good measure for how 
> much we have to improve our engineering practices.
>
> But anyhow this is one very special case. Coming back to the overall issue, 
> what we actually do today is run all leaf/middle/root builds whenever 
> anything in any leaf/middle/root layer is changed. And we track greenness and 
> flakiness at this same level of granularity.

I wonder how hard it would be to track greenness and flakiness at the
level of gradle project (or even lower), viewed hierarchically.

> Recall my (non-binding) starting point guessing at what tests should or 
> should not run in some scenarios: (this tangent is just about the third one, 
> where I explicitly said maybe we run all the same tests and then we want to 
> focus on separating signals as Luke pointed out)
>
> > - changing an IO or runner would not trigger the 20 minutes of core SDK 
> > tests
> > - changing a runner would not trigger the long IO local integration tests
> > - changing the core SDK could potentially not run as many tests in 
> > presubmit, but maybe it would and they would be separately reported results 
> > with clear flakiness signal
>
> And let's consider even more concrete examples:
>
>  - when changing a Fn API proto, how important is it to run RabbitMqIOTest?
>  - when changing JdbcIO, how important is it to run the Java SDK 
> needsRunnerTests? RabbitMqIOTest?
>  - when changing the FlinkRunner, how important is it to make sure that 
> Nexmark queries still match their models when run on direct runner?
>
> I chose these examples to all have zero value, of course. And I've 
> deliberately included an example of a core change and a leaf test. Not all 
> (core change, leaf test) pairs are equally important. The vast majority of 
> all tests we run are literally unable to be affected by the changes 
> triggering the test. So that's why enabling Gradle cache or using a plugin 
> like Brian found could help part of the issue, but not the whole issue, again 
> as Luke reminded.

For (2) and (3), I would hope that the build dependency graph could
exclude them. You're right about (1) (and I've hit that countless
times), but would rather err on the side of accidentally running too
many tests than not enough. If we make manual edits to what can be
inferred by the build graph, let's make it a blacklist rather than an
allow list to avoid accidental lost coverage.

> We make these tradeoffs all the time, of course, via putting some tests in 
> *IT and postCommit runs and some in *Test, implicitly preCommit. But I am 
> imagining a future where we can decouple the test suite definitions (very 
> stable, not depending on the project context) from the decision of where and 
> when to run them (less stable, chan

Re: Finer-grained test runs?

2020-07-09 Thread Luke Cwik
No, not without doing the research myself to see what is the current
tooling available.

On Thu, Jul 9, 2020 at 1:17 PM Kenneth Knowles  wrote:

>
>
> On Thu, Jul 9, 2020 at 1:10 PM Luke Cwik  wrote:
>
>> The budget would represent some criteria that we need from tests (e.g.
>> percent passed, max num skipped tests, test execution time, ...). If we
>> fail the criteria then there must be actionable work (such as fix tests)
>> followed with something that prevents the status quo from continuing (such
>> as preventing releases/features being merged) until the criteria is
>> satisfied again.
>>
>
> +1 . This is aligned with "CI as monitoring/alerting of the health of the
> machine that is your evolving codebase", which I very much subscribe to.
> Alert when something is wrong (another missing piece: have a quick way to
> ack and suppress false alarms in those cases you really want a sensitive
> alert).
>
> Do you know good implementation choices in Gradle/JUnit/Jenkins? (asking
> before searching for it myself)
>
> Kenn
>
>
>> On Thu, Jul 9, 2020 at 1:00 PM Kenneth Knowles  wrote:
>>
>>>
>>>
>>> On Thu, Jul 9, 2020 at 11:47 AM Robert Bradshaw 
>>> wrote:
>>>
 On Thu, Jul 9, 2020 at 8:40 AM Luke Cwik  wrote:
 >
 >> If Brian's: it does not result in redundant build (if plugin works)
 since it would be one Gradle build process. But it does do a full build if
 you touch something at the root of the ancestry tree like core SDK or
 model. I would like to avoid automatically testing descendants if we can,
 since things like Nexmark and most IOs are not sensitive to the vast
 majority of model or core SDK changes. Runners are borderline.
 >
 > I believe that the cost of fixing an issue that is found later once
 the test starts failing because the test wasn't run as part of the PR has a
 much higher order of magnitude of cost to triage and fix. Mostly due to
 loss of context from the PR author/reviewer and if the culprit PR can't be
 found then whoever is trying to fix it.

 Huge +1 to this.

>>>
>>> Totally agree. This abstract statement is clearly true. I suggest
>>> considering things more concretely.
>>>
>>> Ideally we could count on the build system (and good caching) to only
 test what actually needs to be tested, and with work being done on
 runners and IOs this would be a small subset of our entire suite. When
 working lower in the stack (and I am prone to do) I think it's
 acceptable to have longer wait times--and would *much* rather pay that
 price than discover things later. Perhaps some things could be
 surgically removed (it would be interesting to mine data on how often
 test failures in the "leaves" catch real issues), but I would do that
 with care. That being said, flakiness is really an issues (and it
 seems these days I have to re-run tests, often multiple times, to get
 a PR to green; splitting up jobs could help that as well).

>>>
>>> Agree with your sentiment that a longer wait for core changes is
>>> generally fine; my phrasing above overemphasized this case. Anecdotally,
>>> without mining data, leaf modules do catch bugs in core changes sometimes
>>> when (by definition) they are not adequately tested. This is a good measure
>>> for how much we have to improve our engineering practices.
>>>
>>> But anyhow this is one very special case. Coming back to the overall
>>> issue, what we actually do today is run all leaf/middle/root builds
>>> whenever anything in any leaf/middle/root layer is changed. And we track
>>> greenness and flakiness at this same level of granularity.
>>>
>>> Recall my (non-binding) starting point guessing at what tests should or
>>> should not run in some scenarios: (this tangent is just about the third
>>> one, where I explicitly said maybe we run all the same tests and then we
>>> want to focus on separating signals as Luke pointed out)
>>>
>>> > - changing an IO or runner would not trigger the 20 minutes of core
>>> SDK tests
>>> > - changing a runner would not trigger the long IO local integration
>>> tests
>>> > - changing the core SDK could potentially not run as many tests in
>>> presubmit, but maybe it would and they would be separately reported results
>>> with clear flakiness signal
>>>
>>> And let's consider even more concrete examples:
>>>
>>>  - when changing a Fn API proto, how important is it to run
>>> RabbitMqIOTest?
>>>  - when changing JdbcIO, how important is it to run the Java SDK
>>> needsRunnerTests? RabbitMqIOTest?
>>>  - when changing the FlinkRunner, how important is it to make sure that
>>> Nexmark queries still match their models when run on direct runner?
>>>
>>> I chose these examples to all have zero value, of course. And I've
>>> deliberately included an example of a core change and a leaf test. Not all
>>> (core change, leaf test) pairs are equally important. The vast majority of
>>> all tests we run are literally unable to be aff

Re: Contributor permission for Beam Jira ticket

2020-07-09 Thread Luke Cwik
Welcome.

I have added you to Beam's Jira.

On Thu, Jul 9, 2020 at 9:30 AM Jiahao Wu  wrote:

> Hi,
>
> This is Jiahao from Google. I am working in the Google Cloud HCLS team
> this summer and we want to add ab IO connector for our API to better
> support our customers. Can someone add me as a contributor for Beam's Jira
> issue tracker so I can create/assign tickets for my work?
> My Jira username is: jiahaowu
>
> Thanks,
> Jiahao
>


Re: KinesisIO Tests - are they run anywhere?

2020-07-09 Thread Luke Cwik
It has come up a few times[1, 2, 3, 4] and there have also been a few
comments over time about whether someone could donate AWS resources to the
project.

1: https://issues.apache.org/jira/browse/BEAM-601
2: https://issues.apache.org/jira/browse/BEAM-3373
3: https://issues.apache.org/jira/browse/BEAM-3550
4: https://issues.apache.org/jira/browse/BEAM-3032

On Thu, Jul 9, 2020 at 1:02 PM Mani Kolbe  wrote:

> Have you guys considered using localstack to run AWS service based
> integration tests?
>
> https://github.com/localstack/localstack
>
> On Thu, 9 Jul, 2020, 5:25 PM Piotr Szuberski, 
> wrote:
>
>> Yeah, I meant KinesisIOIT tests. I'll do the same with the cross-language
>> it tests then. Thanks for your reply :)
>>
>> On 2020/07/08 17:13:11, Alexey Romanenko 
>> wrote:
>> > If you mean Java KinesisIO tests, then unit tests are running on
>> Jenkins [1] and ITs are not running since it requires AWS credentials that
>> we don’t have dedicated to Beam for the moment.
>> >
>> > In the same time, you can run KinesisIOIT with your own credentials,
>> like we do in Talend (a company that I work for).
>> >
>> > [1]
>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12209/testReport/org.apache.beam.sdk.io.kinesis/
>> <
>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12209/testReport/org.apache.beam.sdk.io.kinesis/
>> >
>> >
>> > > On 8 Jul 2020, at 13:11, Piotr Szuberski 
>> wrote:
>> > >
>> > > I'm writing KinesisIO external transform with python wrapper and I
>> found that the tests aren't executed anywhere in Jenkins. Am I wrong or
>> there is a reason for that?
>> >
>> >
>>
>


Re: Finer-grained test runs?

2020-07-09 Thread Kenneth Knowles
On Thu, Jul 9, 2020 at 1:10 PM Luke Cwik  wrote:

> The budget would represent some criteria that we need from tests (e.g.
> percent passed, max num skipped tests, test execution time, ...). If we
> fail the criteria then there must be actionable work (such as fix tests)
> followed with something that prevents the status quo from continuing (such
> as preventing releases/features being merged) until the criteria is
> satisfied again.
>

+1 . This is aligned with "CI as monitoring/alerting of the health of the
machine that is your evolving codebase", which I very much subscribe to.
Alert when something is wrong (another missing piece: have a quick way to
ack and suppress false alarms in those cases you really want a sensitive
alert).

Do you know good implementation choices in Gradle/JUnit/Jenkins? (asking
before searching for it myself)

Kenn


> On Thu, Jul 9, 2020 at 1:00 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Thu, Jul 9, 2020 at 11:47 AM Robert Bradshaw 
>> wrote:
>>
>>> On Thu, Jul 9, 2020 at 8:40 AM Luke Cwik  wrote:
>>> >
>>> >> If Brian's: it does not result in redundant build (if plugin works)
>>> since it would be one Gradle build process. But it does do a full build if
>>> you touch something at the root of the ancestry tree like core SDK or
>>> model. I would like to avoid automatically testing descendants if we can,
>>> since things like Nexmark and most IOs are not sensitive to the vast
>>> majority of model or core SDK changes. Runners are borderline.
>>> >
>>> > I believe that the cost of fixing an issue that is found later once
>>> the test starts failing because the test wasn't run as part of the PR has a
>>> much higher order of magnitude of cost to triage and fix. Mostly due to
>>> loss of context from the PR author/reviewer and if the culprit PR can't be
>>> found then whoever is trying to fix it.
>>>
>>> Huge +1 to this.
>>>
>>
>> Totally agree. This abstract statement is clearly true. I suggest
>> considering things more concretely.
>>
>> Ideally we could count on the build system (and good caching) to only
>>> test what actually needs to be tested, and with work being done on
>>> runners and IOs this would be a small subset of our entire suite. When
>>> working lower in the stack (and I am prone to do) I think it's
>>> acceptable to have longer wait times--and would *much* rather pay that
>>> price than discover things later. Perhaps some things could be
>>> surgically removed (it would be interesting to mine data on how often
>>> test failures in the "leaves" catch real issues), but I would do that
>>> with care. That being said, flakiness is really an issues (and it
>>> seems these days I have to re-run tests, often multiple times, to get
>>> a PR to green; splitting up jobs could help that as well).
>>>
>>
>> Agree with your sentiment that a longer wait for core changes is
>> generally fine; my phrasing above overemphasized this case. Anecdotally,
>> without mining data, leaf modules do catch bugs in core changes sometimes
>> when (by definition) they are not adequately tested. This is a good measure
>> for how much we have to improve our engineering practices.
>>
>> But anyhow this is one very special case. Coming back to the overall
>> issue, what we actually do today is run all leaf/middle/root builds
>> whenever anything in any leaf/middle/root layer is changed. And we track
>> greenness and flakiness at this same level of granularity.
>>
>> Recall my (non-binding) starting point guessing at what tests should or
>> should not run in some scenarios: (this tangent is just about the third
>> one, where I explicitly said maybe we run all the same tests and then we
>> want to focus on separating signals as Luke pointed out)
>>
>> > - changing an IO or runner would not trigger the 20 minutes of core SDK
>> tests
>> > - changing a runner would not trigger the long IO local integration
>> tests
>> > - changing the core SDK could potentially not run as many tests in
>> presubmit, but maybe it would and they would be separately reported results
>> with clear flakiness signal
>>
>> And let's consider even more concrete examples:
>>
>>  - when changing a Fn API proto, how important is it to run
>> RabbitMqIOTest?
>>  - when changing JdbcIO, how important is it to run the Java SDK
>> needsRunnerTests? RabbitMqIOTest?
>>  - when changing the FlinkRunner, how important is it to make sure that
>> Nexmark queries still match their models when run on direct runner?
>>
>> I chose these examples to all have zero value, of course. And I've
>> deliberately included an example of a core change and a leaf test. Not all
>> (core change, leaf test) pairs are equally important. The vast majority of
>> all tests we run are literally unable to be affected by the changes
>> triggering the test. So that's why enabling Gradle cache or using a plugin
>> like Brian found could help part of the issue, but not the whole issue,
>> again as Luke reminded.
>>
>> We make these tradeoffs all the time, of course, vi

Re: Finer-grained test runs?

2020-07-09 Thread Luke Cwik
The budget would represent some criteria that we need from tests (e.g.
percent passed, max num skipped tests, test execution time, ...). If we
fail the criteria then there must be actionable work (such as fix tests)
followed with something that prevents the status quo from continuing (such
as preventing releases/features being merged) until the criteria is
satisfied again.

On Thu, Jul 9, 2020 at 1:00 PM Kenneth Knowles  wrote:

>
>
> On Thu, Jul 9, 2020 at 11:47 AM Robert Bradshaw 
> wrote:
>
>> On Thu, Jul 9, 2020 at 8:40 AM Luke Cwik  wrote:
>> >
>> >> If Brian's: it does not result in redundant build (if plugin works)
>> since it would be one Gradle build process. But it does do a full build if
>> you touch something at the root of the ancestry tree like core SDK or
>> model. I would like to avoid automatically testing descendants if we can,
>> since things like Nexmark and most IOs are not sensitive to the vast
>> majority of model or core SDK changes. Runners are borderline.
>> >
>> > I believe that the cost of fixing an issue that is found later once the
>> test starts failing because the test wasn't run as part of the PR has a
>> much higher order of magnitude of cost to triage and fix. Mostly due to
>> loss of context from the PR author/reviewer and if the culprit PR can't be
>> found then whoever is trying to fix it.
>>
>> Huge +1 to this.
>>
>
> Totally agree. This abstract statement is clearly true. I suggest
> considering things more concretely.
>
> Ideally we could count on the build system (and good caching) to only
>> test what actually needs to be tested, and with work being done on
>> runners and IOs this would be a small subset of our entire suite. When
>> working lower in the stack (and I am prone to do) I think it's
>> acceptable to have longer wait times--and would *much* rather pay that
>> price than discover things later. Perhaps some things could be
>> surgically removed (it would be interesting to mine data on how often
>> test failures in the "leaves" catch real issues), but I would do that
>> with care. That being said, flakiness is really an issues (and it
>> seems these days I have to re-run tests, often multiple times, to get
>> a PR to green; splitting up jobs could help that as well).
>>
>
> Agree with your sentiment that a longer wait for core changes is generally
> fine; my phrasing above overemphasized this case. Anecdotally, without
> mining data, leaf modules do catch bugs in core changes sometimes when (by
> definition) they are not adequately tested. This is a good measure for how
> much we have to improve our engineering practices.
>
> But anyhow this is one very special case. Coming back to the overall
> issue, what we actually do today is run all leaf/middle/root builds
> whenever anything in any leaf/middle/root layer is changed. And we track
> greenness and flakiness at this same level of granularity.
>
> Recall my (non-binding) starting point guessing at what tests should or
> should not run in some scenarios: (this tangent is just about the third
> one, where I explicitly said maybe we run all the same tests and then we
> want to focus on separating signals as Luke pointed out)
>
> > - changing an IO or runner would not trigger the 20 minutes of core SDK
> tests
> > - changing a runner would not trigger the long IO local integration tests
> > - changing the core SDK could potentially not run as many tests in
> presubmit, but maybe it would and they would be separately reported results
> with clear flakiness signal
>
> And let's consider even more concrete examples:
>
>  - when changing a Fn API proto, how important is it to run RabbitMqIOTest?
>  - when changing JdbcIO, how important is it to run the Java SDK
> needsRunnerTests? RabbitMqIOTest?
>  - when changing the FlinkRunner, how important is it to make sure that
> Nexmark queries still match their models when run on direct runner?
>
> I chose these examples to all have zero value, of course. And I've
> deliberately included an example of a core change and a leaf test. Not all
> (core change, leaf test) pairs are equally important. The vast majority of
> all tests we run are literally unable to be affected by the changes
> triggering the test. So that's why enabling Gradle cache or using a plugin
> like Brian found could help part of the issue, but not the whole issue,
> again as Luke reminded.
>
> We make these tradeoffs all the time, of course, via putting some tests in
> *IT and postCommit runs and some in *Test, implicitly preCommit. But I am
> imagining a future where we can decouple the test suite definitions (very
> stable, not depending on the project context) from the decision of where
> and when to run them (less stable, changing as the project changes).
>
> My assumption is that the project will only grow and all these problems
> (flakiness, runtime, false coupling) will continue to get worse. I raised
> this now so we could consider what is a steady state approach that could
> scale, before it 

Re: KinesisIO Tests - are they run anywhere?

2020-07-09 Thread Mani Kolbe
Have you guys considered using localstack to run AWS service based
integration tests?

https://github.com/localstack/localstack

On Thu, 9 Jul, 2020, 5:25 PM Piotr Szuberski, 
wrote:

> Yeah, I meant KinesisIOIT tests. I'll do the same with the cross-language
> it tests then. Thanks for your reply :)
>
> On 2020/07/08 17:13:11, Alexey Romanenko 
> wrote:
> > If you mean Java KinesisIO tests, then unit tests are running on Jenkins
> [1] and ITs are not running since it requires AWS credentials that we don’t
> have dedicated to Beam for the moment.
> >
> > In the same time, you can run KinesisIOIT with your own credentials,
> like we do in Talend (a company that I work for).
> >
> > [1]
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12209/testReport/org.apache.beam.sdk.io.kinesis/
> <
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12209/testReport/org.apache.beam.sdk.io.kinesis/
> >
> >
> > > On 8 Jul 2020, at 13:11, Piotr Szuberski 
> wrote:
> > >
> > > I'm writing KinesisIO external transform with python wrapper and I
> found that the tests aren't executed anywhere in Jenkins. Am I wrong or
> there is a reason for that?
> >
> >
>


Re: Finer-grained test runs?

2020-07-09 Thread Kenneth Knowles
On Thu, Jul 9, 2020 at 11:47 AM Robert Bradshaw  wrote:

> On Thu, Jul 9, 2020 at 8:40 AM Luke Cwik  wrote:
> >
> >> If Brian's: it does not result in redundant build (if plugin works)
> since it would be one Gradle build process. But it does do a full build if
> you touch something at the root of the ancestry tree like core SDK or
> model. I would like to avoid automatically testing descendants if we can,
> since things like Nexmark and most IOs are not sensitive to the vast
> majority of model or core SDK changes. Runners are borderline.
> >
> > I believe that the cost of fixing an issue that is found later once the
> test starts failing because the test wasn't run as part of the PR has a
> much higher order of magnitude of cost to triage and fix. Mostly due to
> loss of context from the PR author/reviewer and if the culprit PR can't be
> found then whoever is trying to fix it.
>
> Huge +1 to this.
>

Totally agree. This abstract statement is clearly true. I suggest
considering things more concretely.

Ideally we could count on the build system (and good caching) to only
> test what actually needs to be tested, and with work being done on
> runners and IOs this would be a small subset of our entire suite. When
> working lower in the stack (and I am prone to do) I think it's
> acceptable to have longer wait times--and would *much* rather pay that
> price than discover things later. Perhaps some things could be
> surgically removed (it would be interesting to mine data on how often
> test failures in the "leaves" catch real issues), but I would do that
> with care. That being said, flakiness is really an issues (and it
> seems these days I have to re-run tests, often multiple times, to get
> a PR to green; splitting up jobs could help that as well).
>

Agree with your sentiment that a longer wait for core changes is generally
fine; my phrasing above overemphasized this case. Anecdotally, without
mining data, leaf modules do catch bugs in core changes sometimes when (by
definition) they are not adequately tested. This is a good measure for how
much we have to improve our engineering practices.

But anyhow this is one very special case. Coming back to the overall issue,
what we actually do today is run all leaf/middle/root builds whenever
anything in any leaf/middle/root layer is changed. And we track greenness
and flakiness at this same level of granularity.

Recall my (non-binding) starting point guessing at what tests should or
should not run in some scenarios: (this tangent is just about the third
one, where I explicitly said maybe we run all the same tests and then we
want to focus on separating signals as Luke pointed out)

> - changing an IO or runner would not trigger the 20 minutes of core SDK
tests
> - changing a runner would not trigger the long IO local integration tests
> - changing the core SDK could potentially not run as many tests in
presubmit, but maybe it would and they would be separately reported results
with clear flakiness signal

And let's consider even more concrete examples:

 - when changing a Fn API proto, how important is it to run RabbitMqIOTest?
 - when changing JdbcIO, how important is it to run the Java SDK
needsRunnerTests? RabbitMqIOTest?
 - when changing the FlinkRunner, how important is it to make sure that
Nexmark queries still match their models when run on direct runner?

I chose these examples to all have zero value, of course. And I've
deliberately included an example of a core change and a leaf test. Not all
(core change, leaf test) pairs are equally important. The vast majority of
all tests we run are literally unable to be affected by the changes
triggering the test. So that's why enabling Gradle cache or using a plugin
like Brian found could help part of the issue, but not the whole issue,
again as Luke reminded.

We make these tradeoffs all the time, of course, via putting some tests in
*IT and postCommit runs and some in *Test, implicitly preCommit. But I am
imagining a future where we can decouple the test suite definitions (very
stable, not depending on the project context) from the decision of where
and when to run them (less stable, changing as the project changes).

My assumption is that the project will only grow and all these problems
(flakiness, runtime, false coupling) will continue to get worse. I raised
this now so we could consider what is a steady state approach that could
scale, before it becomes an emergency. I take it as a given that it is
harder to change culture than it is to change infra/code, so I am not
considering any possibility of more attention to flaky tests or more
attention to testing the core properly or more attention to making tests
snappy or more careful consideration of *IT and *Test. (unless we build
infra that forces more attention to these things)

Incidentally, SQL is not actually fully factored out. If you edit SQL it
runs a limited subset defined by :sqlPreCommit. If you edit core, then
:javaPreCommit still includes SQL te

Re: Monitoring performance for releases

2020-07-09 Thread Maximilian Michels
Not yet, I just learned about the migration to a new frontend, including 
a new backend (InfluxDB instead of BigQuery).



 - Are the metrics available on metrics.beam.apache.org?


Is http://metrics.beam.apache.org online? I was never able to access it.


 - What is the feature delta between usinig metrics.beam.apache.org (much 
better UI) and using apache-beam-testing.appspot.com?


AFAIK it is an ongoing migration and the delta appears to be high.


 - Can we notice regressions faster than release cadence?


Absolutely! A report with the latest numbers including statistics about 
the growth of metrics would be useful.



 - Can we get automated alerts?


I think we could setup a Jenkins job to do this.

-Max

On 09.07.20 20:26, Kenneth Knowles wrote:

Questions:

  - Are the metrics available on metrics.beam.apache.org 
?
  - What is the feature delta between usinig metrics.beam.apache.org 
 (much better UI) and using 
apache-beam-testing.appspot.com ?

  - Can we notice regressions faster than release cadence?
  - Can we get automated alerts?

Kenn

On Thu, Jul 9, 2020 at 10:21 AM Maximilian Michels > wrote:


Hi,

We recently saw an increase in latency migrating from Beam 2.18.0 to
2.21.0 (Python SDK with Flink Runner). This proofed very hard to debug
and it looks like each version in between the two versions let to
increased latency.

This is not the first time we saw issues when migrating, another
time we
had a decline in checkpointing performance and thus added a
checkpointing test [1] and dashboard [2] (see checkpointing widget).

That makes me wonder if we should monitor performance (throughput /
latency) for basic use cases as part of the release testing. Currently,
our release guide [3] mentions running examples but not evaluating the
performance. I think it would be good practice to check relevant charts
with performance measurements as part of of the release process. The
release guide should reflect that.

WDYT?

-Max

PS: Of course, this requires tests and metrics to be available. This PR
adds latency measurements to the load tests [4].


[1] https://github.com/apache/beam/pull/11558
[2]
https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
[3] https://beam.apache.org/contribute/release-guide/
[4] https://github.com/apache/beam/pull/12065



Re: Finer-grained test runs?

2020-07-09 Thread Robert Bradshaw
On Thu, Jul 9, 2020 at 8:40 AM Luke Cwik  wrote:
>
>> If Brian's: it does not result in redundant build (if plugin works) since it 
>> would be one Gradle build process. But it does do a full build if you touch 
>> something at the root of the ancestry tree like core SDK or model. I would 
>> like to avoid automatically testing descendants if we can, since things like 
>> Nexmark and most IOs are not sensitive to the vast majority of model or core 
>> SDK changes. Runners are borderline.
>
> I believe that the cost of fixing an issue that is found later once the test 
> starts failing because the test wasn't run as part of the PR has a much 
> higher order of magnitude of cost to triage and fix. Mostly due to loss of 
> context from the PR author/reviewer and if the culprit PR can't be found then 
> whoever is trying to fix it.

Huge +1 to this.

Ideally we could count on the build system (and good caching) to only
test what actually needs to be tested, and with work being done on
runners and IOs this would be a small subset of our entire suite. When
working lower in the stack (and I am prone to do) I think it's
acceptable to have longer wait times--and would *much* rather pay that
price than discover things later. Perhaps some things could be
surgically removed (it would be interesting to mine data on how often
test failures in the "leaves" catch real issues), but I would do that
with care. That being said, flakiness is really an issues (and it
seems these days I have to re-run tests, often multiple times, to get
a PR to green; splitting up jobs could help that as well).


Re: contributor permission for Beam Jira tickets

2020-07-09 Thread Kenneth Knowles
Done!

On Thu, Jul 9, 2020 at 11:28 AM Damian Gadomski 
wrote:

> Hi,
>
> Can I be added to the JIRA contributors so I can assign tickets to
> myself, please?
>
> my Jira username: damgad
>
> Thanks,
> Damian
>


contributor permission for Beam Jira tickets

2020-07-09 Thread Damian Gadomski
Hi,

Can I be added to the JIRA contributors so I can assign tickets to myself,
please?

my Jira username: damgad

Thanks,
Damian


Re: Monitoring performance for releases

2020-07-09 Thread Kenneth Knowles
Questions:

 - Are the metrics available on metrics.beam.apache.org?
 - What is the feature delta between usinig metrics.beam.apache.org (much
better UI) and using apache-beam-testing.appspot.com?
 - Can we notice regressions faster than release cadence?
 - Can we get automated alerts?

Kenn

On Thu, Jul 9, 2020 at 10:21 AM Maximilian Michels  wrote:

> Hi,
>
> We recently saw an increase in latency migrating from Beam 2.18.0 to
> 2.21.0 (Python SDK with Flink Runner). This proofed very hard to debug
> and it looks like each version in between the two versions let to
> increased latency.
>
> This is not the first time we saw issues when migrating, another time we
> had a decline in checkpointing performance and thus added a
> checkpointing test [1] and dashboard [2] (see checkpointing widget).
>
> That makes me wonder if we should monitor performance (throughput /
> latency) for basic use cases as part of the release testing. Currently,
> our release guide [3] mentions running examples but not evaluating the
> performance. I think it would be good practice to check relevant charts
> with performance measurements as part of of the release process. The
> release guide should reflect that.
>
> WDYT?
>
> -Max
>
> PS: Of course, this requires tests and metrics to be available. This PR
> adds latency measurements to the load tests [4].
>
>
> [1] https://github.com/apache/beam/pull/11558
> [2]
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> [3] https://beam.apache.org/contribute/release-guide/
> [4] https://github.com/apache/beam/pull/12065
>


Re: Finer-grained test runs?

2020-07-09 Thread Kenneth Knowles
On Thu, Jul 9, 2020 at 8:40 AM Luke Cwik  wrote:

> On Wed, Jul 8, 2020 at 9:22 PM Kenneth Knowles  wrote:
>
>> I like your use of "ancestor" and "descendant". I will adopt it.
>>
>> On Wed, Jul 8, 2020 at 4:53 PM Robert Bradshaw 
>> wrote:
>>
>>> On Wed, Jul 8, 2020 at 4:44 PM Luke Cwik  wrote:
>>> >
>>> > I'm not sure that breaking it up will be significantly faster since
>>> each module needs to build its ancestors and run tests of itself and all of
>>> its descendants which isn't a trivial amount of work. We have only so many
>>> executors and with the increased number of jobs, won't we just be waiting
>>> for queued jobs to start?
>>
>>
>>
>> I think that depends on how many fewer tests we could run (or rerun)
>>> for the average PR. (It would also be nice if we could share build
>>> artifacts across executors (is there something like ccache for
>>> javac?), but maybe that's too far-fetched?)
>>>
>>
>> Robert: The gradle cache should remain valid across runs, I think... my
>> latest understanding was that it was a robust up-to-date check (aka not
>> `make`). We may have messed this up, as I am not seeing as much caching as
>> I would expect nor as much as I see locally. We had to do some tweaking in
>> the maven days to put the .m2 directory outside of the realm wiped for each
>> new build. Maybe we are clobbering the Gradle cache too. That might
>> actually make most builds so fast we do not care about my proposal.
>>
>
> The gradle cache relies on our inputs/outputs to be specified correctly.
> It's great that this has been fixed since I was under the impression that
> it was disabled and/or that we used --rerun-tasks everywhere.
>

Sorry, when I said *should* I mean that if it is not currently being used,
we should do what it takes to use it. Based on the scans, I don't think
test results are being cached. But I could have read things wrong...


Luke: I am not sure if you are replying to my email or to Brian's.
>>
> If Brian's: it does not result in redundant build (if plugin works) since
>> it would be one Gradle build process. But it does do a full build if you
>> touch something at the root of the ancestry tree like core SDK or model. I
>> would like to avoid automatically testing descendants if we can, since
>> things like Nexmark and most IOs are not sensitive to the vast majority of
>> model or core SDK changes. Runners are borderline.
>>
>
> I believe that the cost of fixing an issue that is found later once the
> test starts failing because the test wasn't run as part of the PR has a
> much higher order of magnitude of cost to triage and fix. Mostly due to
> loss of context from the PR author/reviewer and if the culprit PR can't be
> found then whoever is trying to fix it.
>
> If we are willing to not separate out into individual jobs then we are
> really trying to make the job faster.
>

It would also reduce flakiness, which was a key motivator for this thread.
It is a good point about separate signals, which I somehow forgot in
between emails. So an approach based on separate jobs is not strictly
worse, since it has this benefit.


How much digging have folks done into the build scans since they show a lot
> of details that are useful around what is slow for a specific job. Take the
> Java Precommit for example:
> * The timeline of what tasks ran when:
> https://scans.gradle.com/s/u2rkcnww2fs24/timeline (looks like nexmark
> testing is 30 mins long and is the straggler)
>

I did a bit of this digging the other day. Separating Nexmark out from Java
(as we did with SQL) would be a mitigation that addresses job speed. I
planned on doing this today. Separating out each of the 10 minute IO and
runner runs would also improve speed and reduce flakiness but then this is
turning into a longer task. Doing this with include/exclude patterns in job
files is simple [1] but will get harder to keep consistent. I would guess
they are already inconsistent.

Here's a sketch of one way that this can scale: have the metadata that
defines trigger patterns and test targets live next to the modules. Then it
scales just as well as authoring modules does. You need some code to
assemble the appropriate job triggers from the declared ancestry. This
could have the benefit that the signal is for a module and not for a job.
Changing the triggers or refactoring how different things run would not
reset the meaning of the signal, as it does now.


* It looks like our build cache (
> https://scans.gradle.com/s/u2rkcnww2fs24/performance/buildCache) is
> saving about 5% of total cpu time, should we consider setting up a remote
> build cache?
>
> If mine: you could assume my proposal is like Brian's but with full
>> isolated Jenkins builds. This would be strictly worse, since it would add
>> redundant builds of ancestors. I am assuming that you always run a separate
>> Jenkins job for every descendant. Still, many modules have fewer
>> descendants. And they do not trigger all the way up to the root and down to
>> all descend

Monitoring performance for releases

2020-07-09 Thread Maximilian Michels

Hi,

We recently saw an increase in latency migrating from Beam 2.18.0 to 
2.21.0 (Python SDK with Flink Runner). This proofed very hard to debug 
and it looks like each version in between the two versions let to 
increased latency.


This is not the first time we saw issues when migrating, another time we 
had a decline in checkpointing performance and thus added a 
checkpointing test [1] and dashboard [2] (see checkpointing widget).


That makes me wonder if we should monitor performance (throughput / 
latency) for basic use cases as part of the release testing. Currently, 
our release guide [3] mentions running examples but not evaluating the 
performance. I think it would be good practice to check relevant charts 
with performance measurements as part of of the release process. The 
release guide should reflect that.


WDYT?

-Max

PS: Of course, this requires tests and metrics to be available. This PR 
adds latency measurements to the load tests [4].



[1] https://github.com/apache/beam/pull/11558
[2] 
https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056

[3] https://beam.apache.org/contribute/release-guide/
[4] https://github.com/apache/beam/pull/12065


Contributor permission for Beam Jira ticket

2020-07-09 Thread Jiahao Wu
Hi,

This is Jiahao from Google. I am working in the Google Cloud HCLS team this
summer and we want to add ab IO connector for our API to better support our
customers. Can someone add me as a contributor for Beam's Jira issue
tracker so I can create/assign tickets for my work?
My Jira username is: jiahaowu

Thanks,
Jiahao


Re: KinesisIO Tests - are they run anywhere?

2020-07-09 Thread Piotr Szuberski
Yeah, I meant KinesisIOIT tests. I'll do the same with the cross-language it 
tests then. Thanks for your reply :)

On 2020/07/08 17:13:11, Alexey Romanenko  wrote: 
> If you mean Java KinesisIO tests, then unit tests are running on Jenkins [1] 
> and ITs are not running since it requires AWS credentials that we don’t have 
> dedicated to Beam for the moment.
> 
> In the same time, you can run KinesisIOIT with your own credentials, like we 
> do in Talend (a company that I work for).
> 
> [1] 
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12209/testReport/org.apache.beam.sdk.io.kinesis/
>  
> 
> 
> > On 8 Jul 2020, at 13:11, Piotr Szuberski  
> > wrote:
> > 
> > I'm writing KinesisIO external transform with python wrapper and I found 
> > that the tests aren't executed anywhere in Jenkins. Am I wrong or there is 
> > a reason for that?
> 
> 


Re: Finer-grained test runs?

2020-07-09 Thread Luke Cwik
On Wed, Jul 8, 2020 at 9:22 PM Kenneth Knowles  wrote:

> I like your use of "ancestor" and "descendant". I will adopt it.
>
> On Wed, Jul 8, 2020 at 4:53 PM Robert Bradshaw 
> wrote:
>
>> On Wed, Jul 8, 2020 at 4:44 PM Luke Cwik  wrote:
>> >
>> > I'm not sure that breaking it up will be significantly faster since
>> each module needs to build its ancestors and run tests of itself and all of
>> its descendants which isn't a trivial amount of work. We have only so many
>> executors and with the increased number of jobs, won't we just be waiting
>> for queued jobs to start?
>
>
>
> I think that depends on how many fewer tests we could run (or rerun)
>> for the average PR. (It would also be nice if we could share build
>> artifacts across executors (is there something like ccache for
>> javac?), but maybe that's too far-fetched?)
>>
>
> Robert: The gradle cache should remain valid across runs, I think... my
> latest understanding was that it was a robust up-to-date check (aka not
> `make`). We may have messed this up, as I am not seeing as much caching as
> I would expect nor as much as I see locally. We had to do some tweaking in
> the maven days to put the .m2 directory outside of the realm wiped for each
> new build. Maybe we are clobbering the Gradle cache too. That might
> actually make most builds so fast we do not care about my proposal.
>

The gradle cache relies on our inputs/outputs to be specified correctly.
It's great that this has been fixed since I was under the impression that
it was disabled and/or that we used --rerun-tasks everywhere.

Luke: I am not sure if you are replying to my email or to Brian's.
>
If Brian's: it does not result in redundant build (if plugin works) since
> it would be one Gradle build process. But it does do a full build if you
> touch something at the root of the ancestry tree like core SDK or model. I
> would like to avoid automatically testing descendants if we can, since
> things like Nexmark and most IOs are not sensitive to the vast majority of
> model or core SDK changes. Runners are borderline.
>

I believe that the cost of fixing an issue that is found later once the
test starts failing because the test wasn't run as part of the PR has a
much higher order of magnitude of cost to triage and fix. Mostly due to
loss of context from the PR author/reviewer and if the culprit PR can't be
found then whoever is trying to fix it.

If we are willing to not separate out into individual jobs then we are
really trying to make the job faster. How much digging have folks done into
the build scans since they show a lot of details that are useful around
what is slow for a specific job. Take the Java Precommit for example:
* The timeline of what tasks ran when:
https://scans.gradle.com/s/u2rkcnww2fs24/timeline (looks like nexmark
testing is 30 mins long and is the straggler)
* It looks like our build cache (
https://scans.gradle.com/s/u2rkcnww2fs24/performance/buildCache) is saving
about 5% of total cpu time, should we consider setting up a remote build
cache?


> If mine: you could assume my proposal is like Brian's but with full
> isolated Jenkins builds. This would be strictly worse, since it would add
> redundant builds of ancestors. I am assuming that you always run a separate
> Jenkins job for every descendant. Still, many modules have fewer
> descendants. And they do not trigger all the way up to the root and down to
> all descendants of the root.
>
>
I was replying to yours since differentiated jobs is what gives visibility.
I agree that Brian's approach would make the build faster if it could
figure out everything that needs to run easily and be easy to maintain.


> From a community perspective, extensions and IOs are the most likely use
> case for newcomers. For the person who comes to add or improve FooIO, it is
> not a good experience to hit a flake in RabbitMqIO or JdbcIO or
> DataflowRunner or FlinkRunner flakes.
>

If flakes had a very low failure budget then as a community this would be a
non-issue.


> I think the plugin Brian mentioned is only a start. It would be even
> better for each module to have an opt-in list of descendants to test on
> precommit. This works well with a rollback-first strategy on post-commit.
> We can then replay the PR while triggering the postcommits that failed.
>
> > I agree that we would have better visibility though in github and also
>> in Jenkins.
>>
>> I do have to say having to scroll through a huge number of github
>> checks is not always an improvement.
>>
>
> +1 but OTOH the gradle scan is sometimes too fine grained or associates
> logs oddly (I skip the Jenkins status page almost always)
>
>
>> > Fixing flaky tests would help improve our test signal as well. Not many
>> willing people here though but could be less work then building and
>> maintaining so many different jobs.
>>
>> +1
>>
>
> I agree with fixing flakes, but I want to treat the occurrence and
> resolution of flakiness as standard operations. Just as bug count