Re: Use Coder message for cross-lang ExternalConfigurationPayload?

2020-07-10 Thread Robert Bradshaw
On Fri, Jul 10, 2020 at 4:36 PM Brian Hulette  wrote:
>
> Ah yes I'm +1 for that approach too - it would let us leverage all the 
> schema-inference already in the Java SDK for translating configuration 
> objects which would be great.
> Things on the Python side would be trickier as schemas don't formally support 
> all the types you can use in the PayloadBuilder implementations [1] yet, just 
> NamedTuple. For now we could just make the PayloadBuilder implementations 
> generate Rows without making that translation available for use in 
> PCollections.

Yes, though eventually it might be nice to support all of these
various types as schema'd PCollection elements as well.

> Do we need to worry about update compatibility for 
> ExternalConfigurationPayload?

Technically, each URN defines their payload, and the fact that we've
settled on ExternalConfigurationPayload is a convention. On a
practical note, we haven't declared these protos stable yet. (I would
like to do so before we drop support for Python 2, as external
transforms are a possible escape hatch and the first strong motivation
to have external transforms that span Beam versions).

> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py
>
> On Fri, Jul 10, 2020 at 4:23 PM Robert Bradshaw  wrote:
>>
>> I would be in favor of just using a schema to store the entire
>> configuration. The reason we went with what we have to day is that we
>> didn't have cross language schemas yet.
>>
>> On Fri, Jul 10, 2020 at 12:24 PM Brian Hulette  wrote:
>> >
>> > Hi everyone,
>> > I noticed that currently the ExternalConfigurationPayload uses a list of 
>> > coder URNs to represent the coder that was used to serialize each 
>> > configuration field [1]. This seems acceptable at first blush, but there's 
>> > one notable issue: it has no place to store a payload for the coder. Most 
>> > standard coders don't use a payload so it's not a problem, but row coder 
>> > does use a payload to store it's schema, which means it can't be used in 
>> > an ExternalConfigurationPayload today.
>> >
>> > Is there a reason not to just use the Coder message [2] in 
>> > ExternalConfigurationPayload instead of a list of coder URNs? That would 
>> > work with row coder, and it would also make it easier to re-use logic for 
>> > translating Pipeline protos.
>> >
>> > I'd be happy to make this change, but I wanted to ask on dev@ in case 
>> > there's something I'm missing here.
>> >
>> > Brian
>> >
>> > [1] 
>> > https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35
>> > [2] 
>> > https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555


Re: Use Coder message for cross-lang ExternalConfigurationPayload?

2020-07-10 Thread Brian Hulette
Ah yes I'm +1 for that approach too - it would let us leverage all the
schema-inference already in the Java SDK for translating configuration
objects which would be great.
Things on the Python side would be trickier as schemas don't formally
support all the types you can use in the PayloadBuilder implementations [1]
yet, just NamedTuple. For now we could just make the PayloadBuilder
implementations generate Rows without making that translation available for
use in PCollections.

Do we need to worry about update compatibility for
ExternalConfigurationPayload?

Brian

[1]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py

On Fri, Jul 10, 2020 at 4:23 PM Robert Bradshaw  wrote:

> I would be in favor of just using a schema to store the entire
> configuration. The reason we went with what we have to day is that we
> didn't have cross language schemas yet.
>
> On Fri, Jul 10, 2020 at 12:24 PM Brian Hulette 
> wrote:
> >
> > Hi everyone,
> > I noticed that currently the ExternalConfigurationPayload uses a list of
> coder URNs to represent the coder that was used to serialize each
> configuration field [1]. This seems acceptable at first blush, but there's
> one notable issue: it has no place to store a payload for the coder. Most
> standard coders don't use a payload so it's not a problem, but row coder
> does use a payload to store it's schema, which means it can't be used in an
> ExternalConfigurationPayload today.
> >
> > Is there a reason not to just use the Coder message [2] in
> ExternalConfigurationPayload instead of a list of coder URNs? That would
> work with row coder, and it would also make it easier to re-use logic for
> translating Pipeline protos.
> >
> > I'd be happy to make this change, but I wanted to ask on dev@ in case
> there's something I'm missing here.
> >
> > Brian
> >
> > [1]
> https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35
> > [2]
> https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555
>


Re: [VOTE] Release 2.23.0, release candidate #1

2020-07-10 Thread Ahmet Altay
I validated the python 3 quickstarts. I had issues with running with python
3.8 wheel files, but did not have issues with source distributions, or
other python wheel files. I have not tested python 2 quickstarts.

On Thu, Jul 9, 2020 at 10:53 PM Valentyn Tymofieiev 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 2.23.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint 1DF50603225D29A4 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.23.0-RС1" [5],
> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> * Java artifacts were built with Maven 3.6.0 and Oracle JDK 1.8.0_201-b09 .
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.23.0 release to help with validation
> [9].
> * Docker images published to Docker Hub [10].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1]
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347145
> [2] https://dist.apache.org/repos/dist/dev/beam/2.23.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1105/
> [5] https://github.com/apache/beam/tree/v2.23.0-RC1
> [6] https://github.com/apache/beam/pull/12212
> [7] https://github.com/apache/beam-site/pull/605
> [8] https://github.com/apache/beam/pull/12213
> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=596347973
> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image
>


Re: Use Coder message for cross-lang ExternalConfigurationPayload?

2020-07-10 Thread Robert Bradshaw
I would be in favor of just using a schema to store the entire
configuration. The reason we went with what we have to day is that we
didn't have cross language schemas yet.

On Fri, Jul 10, 2020 at 12:24 PM Brian Hulette  wrote:
>
> Hi everyone,
> I noticed that currently the ExternalConfigurationPayload uses a list of 
> coder URNs to represent the coder that was used to serialize each 
> configuration field [1]. This seems acceptable at first blush, but there's 
> one notable issue: it has no place to store a payload for the coder. Most 
> standard coders don't use a payload so it's not a problem, but row coder does 
> use a payload to store it's schema, which means it can't be used in an 
> ExternalConfigurationPayload today.
>
> Is there a reason not to just use the Coder message [2] in 
> ExternalConfigurationPayload instead of a list of coder URNs? That would work 
> with row coder, and it would also make it easier to re-use logic for 
> translating Pipeline protos.
>
> I'd be happy to make this change, but I wanted to ask on dev@ in case there's 
> something I'm missing here.
>
> Brian
>
> [1] 
> https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35
> [2] 
> https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555


Re: Versioning published Java containers

2020-07-10 Thread Ahmet Altay
Related to the naming question, +1 and this will be similar to the python
container naming (e.g. beam_python3.7_sdk).

On Fri, Jul 10, 2020 at 1:46 PM Pablo Estrada  wrote:

> I agree with Kenn. Dataflow already has some publishing of non-portable
> JAva 11 containers, so I think it'll be great to formalize the process for
> portable containers, and let users play with it, and know of its
> availability.
> Best
> -P.
>
> On Fri, Jul 10, 2020 at 9:42 AM Kenneth Knowles  wrote:
>
>> To the initial question: I'm +1 on the rename. The container is primarily
>> something that the SDK should insert into the pipeline proto during
>> construction, and only user-facing in more specialized situations. Given
>> the state of Java and portability, it is a good time to get things named
>> properly and unambiguously. I think a brief announce to dev@ and user@
>> when it happens is nice-to-have, but no need to give advance warning.
>>
>> Kenn
>>
>> On Fri, Jul 10, 2020 at 7:58 AM Kenneth Knowles  wrote:
>>
>>> I believe Beam already has quite a few users that have forged ahead and
>>> used Java 11 with various runners, pre-portability. Mostly I believe the
>>> Java 11 limitations are with particular features (Schema codegen) and
>>> extensions/IOs/transitive deps.
>>>
>>> When it comes to the container, I'd be interested in looking at test
>>> coverage. The Flink & Spark portable ValidatesRunner suites use EMBEDDED
>>> environment, so they don't exercise the container. The first testing of the
>>> Java SDK harness container against the Python-based Universal Local Runner
>>> is in pull request now [1]. Are there other test suites to highlight? How
>>> hard would it be to run Flink & Spark against the container(s) too?
>>>
>>> Kenn
>>>
>>> [1] https://github.com/apache/beam/pull/11792 (despite the name
>>> ValidatesRunner, in this case it is validating both the runner and harness,
>>> since we don't have a compliance test suite for SDK harnesses)
>>>
>>> On Fri, Jul 10, 2020 at 7:54 AM Tyson Hamilton 
>>> wrote:
>>>
 What do we consider 'ready'?

 Maybe the only required outstanding bugs are supporting the direct
 runner (BEAM-10085), core tests (BEAM-10081), IO tests (BEAM-10084)  to
 start with? Notably this would exclude failing tests like those for GCP
 core, GCPIOs, Dataflow runner, Spark runner, Flink runner, Samza.


 On Thu, Jul 9, 2020 at 4:44 PM Kyle Weaver  wrote:

> My main question is, are we confident the Java 11 container is ready
> to release? AFAIK there are still a number of issues blocking full Java 11
> support (cf [1] ; not
> sure how many of these, if any, affect the SDK harness specifically 
> though.)
>
> For comparison, we recently decided to stop publishing Go SDK
> containers until the Go SDK is considered mature [2]. In the meantime,
> those who want to use the Go SDK can build their own container images from
> source.
>
> Do we already have a Gradle task to build Java 11 containers? If not,
> this would be a good intermediate step, letting users opt-in to Java
> 11 without us overpromising support.
>

 We do not. From what I can tell, the build.gradele [1] for the Java
 container is only for the one version. There is a docker file used for
 Jenkins tests.

 [1]
 https://github.com/apache/beam/blob/master/sdks/java/container/build.gradle


>
 When we eventually do the renaming, we can add a note to CHANGES.md [3].
>
> [1] https://issues.apache.org/jira/browse/BEAM-10090
> [2] https://issues.apache.org/jira/browse/BEAM-9685
> [3] https://github.com/apache/beam/blob/master/CHANGES.md
>
> On Thu, Jul 9, 2020 at 3:44 PM Emily Ye  wrote:
>
>> Hi all,
>>
>> I'm getting ramped up on contributing and was looking into adding the
>> Java 11 harness container to releases (
>> https://issues.apache.org/jira/browse/BEAM-8106) - should I rename
>> the current java container so we have two new images `beam_java8_sdk` and
>> `beam_java11_sdk` or hold off on renaming? If we do rename it, what steps
>> should I take to announce/document the change?
>>
>> Thanks,
>> Emily
>>
>


Re: Versioning published Java containers

2020-07-10 Thread Pablo Estrada
I agree with Kenn. Dataflow already has some publishing of non-portable
JAva 11 containers, so I think it'll be great to formalize the process for
portable containers, and let users play with it, and know of its
availability.
Best
-P.

On Fri, Jul 10, 2020 at 9:42 AM Kenneth Knowles  wrote:

> To the initial question: I'm +1 on the rename. The container is primarily
> something that the SDK should insert into the pipeline proto during
> construction, and only user-facing in more specialized situations. Given
> the state of Java and portability, it is a good time to get things named
> properly and unambiguously. I think a brief announce to dev@ and user@
> when it happens is nice-to-have, but no need to give advance warning.
>
> Kenn
>
> On Fri, Jul 10, 2020 at 7:58 AM Kenneth Knowles  wrote:
>
>> I believe Beam already has quite a few users that have forged ahead and
>> used Java 11 with various runners, pre-portability. Mostly I believe the
>> Java 11 limitations are with particular features (Schema codegen) and
>> extensions/IOs/transitive deps.
>>
>> When it comes to the container, I'd be interested in looking at test
>> coverage. The Flink & Spark portable ValidatesRunner suites use EMBEDDED
>> environment, so they don't exercise the container. The first testing of the
>> Java SDK harness container against the Python-based Universal Local Runner
>> is in pull request now [1]. Are there other test suites to highlight? How
>> hard would it be to run Flink & Spark against the container(s) too?
>>
>> Kenn
>>
>> [1] https://github.com/apache/beam/pull/11792 (despite the name
>> ValidatesRunner, in this case it is validating both the runner and harness,
>> since we don't have a compliance test suite for SDK harnesses)
>>
>> On Fri, Jul 10, 2020 at 7:54 AM Tyson Hamilton 
>> wrote:
>>
>>> What do we consider 'ready'?
>>>
>>> Maybe the only required outstanding bugs are supporting the direct
>>> runner (BEAM-10085), core tests (BEAM-10081), IO tests (BEAM-10084)  to
>>> start with? Notably this would exclude failing tests like those for GCP
>>> core, GCPIOs, Dataflow runner, Spark runner, Flink runner, Samza.
>>>
>>>
>>> On Thu, Jul 9, 2020 at 4:44 PM Kyle Weaver  wrote:
>>>
 My main question is, are we confident the Java 11 container is ready to
 release? AFAIK there are still a number of issues blocking full Java 11
 support (cf [1] ; not
 sure how many of these, if any, affect the SDK harness specifically 
 though.)

 For comparison, we recently decided to stop publishing Go SDK
 containers until the Go SDK is considered mature [2]. In the meantime,
 those who want to use the Go SDK can build their own container images from
 source.

 Do we already have a Gradle task to build Java 11 containers? If not,
 this would be a good intermediate step, letting users opt-in to Java
 11 without us overpromising support.

>>>
>>> We do not. From what I can tell, the build.gradele [1] for the Java
>>> container is only for the one version. There is a docker file used for
>>> Jenkins tests.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/container/build.gradle
>>>
>>>

>>> When we eventually do the renaming, we can add a note to CHANGES.md [3].

 [1] https://issues.apache.org/jira/browse/BEAM-10090
 [2] https://issues.apache.org/jira/browse/BEAM-9685
 [3] https://github.com/apache/beam/blob/master/CHANGES.md

 On Thu, Jul 9, 2020 at 3:44 PM Emily Ye  wrote:

> Hi all,
>
> I'm getting ramped up on contributing and was looking into adding the
> Java 11 harness container to releases (
> https://issues.apache.org/jira/browse/BEAM-8106) - should I rename
> the current java container so we have two new images `beam_java8_sdk` and
> `beam_java11_sdk` or hold off on renaming? If we do rename it, what steps
> should I take to announce/document the change?
>
> Thanks,
> Emily
>



Re:

2020-07-10 Thread Tyson Hamilton
Welcome!

On Fri, Jul 10, 2020, 10:38 AM Rui Wang  wrote:

> Welcome!
>
>
> -Rui
>
> On Fri, Jul 10, 2020 at 10:33 AM Kenneth Knowles  wrote:
>
>> Welcome to dev@ !
>>
>> On Fri, Jul 10, 2020 at 2:14 AM Maximilian Michels 
>> wrote:
>>
>>> Welcome Emily! Looking forward to your questions.
>>>
>>> Cheers,
>>> Max
>>>
>>> On 08.07.20 20:07, Emily Ye wrote:
>>> > Greetings, dev@beam! Just wanted to introduce myself - I'm a SWE at
>>> Google who will be contributing to Beam going forward. I'm pretty new to
>>> the data processing space but I'm excited to learn, and will probably be
>>> asking lots of questions here. Looking forward to getting to know the
>>> community!
>>> >
>>> > - Emily
>>> >
>>> >
>>> >
>>>
>>


Re: Monitoring performance for releases

2020-07-10 Thread Udi Meiri
On Thu, Jul 9, 2020 at 12:48 PM Maximilian Michels  wrote:

> Not yet, I just learned about the migration to a new frontend, including
> a new backend (InfluxDB instead of BigQuery).
>
> >  - Are the metrics available on metrics.beam.apache.org?
>
> Is http://metrics.beam.apache.org online? I was never able to access it.
>

It doesn't support https. I had to add an exception to the HTTPS Everywhere
extension for "metrics.beam.apache.org".


>
> >  - What is the feature delta between usinig metrics.beam.apache.org
> (much better UI) and using apache-beam-testing.appspot.com?
>
> AFAIK it is an ongoing migration and the delta appears to be high.
>
> >  - Can we notice regressions faster than release cadence?
>
> Absolutely! A report with the latest numbers including statistics about
> the growth of metrics would be useful.
>
> >  - Can we get automated alerts?
>
> I think we could setup a Jenkins job to do this.
>
> -Max
>
> On 09.07.20 20:26, Kenneth Knowles wrote:
> > Questions:
> >
> >   - Are the metrics available on metrics.beam.apache.org
> > ?
> >   - What is the feature delta between usinig metrics.beam.apache.org
> >  (much better UI) and using
> > apache-beam-testing.appspot.com  >?
> >   - Can we notice regressions faster than release cadence?
> >   - Can we get automated alerts?
> >
> > Kenn
> >
> > On Thu, Jul 9, 2020 at 10:21 AM Maximilian Michels  > > wrote:
> >
> > Hi,
> >
> > We recently saw an increase in latency migrating from Beam 2.18.0 to
> > 2.21.0 (Python SDK with Flink Runner). This proofed very hard to
> debug
> > and it looks like each version in between the two versions let to
> > increased latency.
> >
> > This is not the first time we saw issues when migrating, another
> > time we
> > had a decline in checkpointing performance and thus added a
> > checkpointing test [1] and dashboard [2] (see checkpointing widget).
> >
> > That makes me wonder if we should monitor performance (throughput /
> > latency) for basic use cases as part of the release testing.
> Currently,
> > our release guide [3] mentions running examples but not evaluating
> the
> > performance. I think it would be good practice to check relevant
> charts
> > with performance measurements as part of of the release process. The
> > release guide should reflect that.
> >
> > WDYT?
> >
> > -Max
> >
> > PS: Of course, this requires tests and metrics to be available. This
> PR
> > adds latency measurements to the load tests [4].
> >
> >
> > [1] https://github.com/apache/beam/pull/11558
> > [2]
> >
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> > [3] https://beam.apache.org/contribute/release-guide/
> > [4] https://github.com/apache/beam/pull/12065
> >
>


smime.p7s
Description: S/MIME Cryptographic Signature


Use Coder message for cross-lang ExternalConfigurationPayload?

2020-07-10 Thread Brian Hulette
Hi everyone,
I noticed that currently the ExternalConfigurationPayload uses a list of
coder URNs to represent the coder that was used to serialize each
configuration field [1]. This seems acceptable at first blush, but there's
one notable issue: it has no place to store a payload for the coder. Most
standard coders don't use a payload so it's not a problem, but row coder
does use a payload to store it's schema, which means it can't be used in an
ExternalConfigurationPayload today.

Is there a reason not to just use the Coder message [2] in
ExternalConfigurationPayload instead of a list of coder URNs? That would
work with row coder, and it would also make it easier to re-use logic for
translating Pipeline protos.

I'd be happy to make this change, but I wanted to ask on dev@ in case
there's something I'm missing here.

Brian

[1]
https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35
[2]
https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555


Re:

2020-07-10 Thread Rui Wang
Welcome!


-Rui

On Fri, Jul 10, 2020 at 10:33 AM Kenneth Knowles  wrote:

> Welcome to dev@ !
>
> On Fri, Jul 10, 2020 at 2:14 AM Maximilian Michels  wrote:
>
>> Welcome Emily! Looking forward to your questions.
>>
>> Cheers,
>> Max
>>
>> On 08.07.20 20:07, Emily Ye wrote:
>> > Greetings, dev@beam! Just wanted to introduce myself - I'm a SWE at
>> Google who will be contributing to Beam going forward. I'm pretty new to
>> the data processing space but I'm excited to learn, and will probably be
>> asking lots of questions here. Looking forward to getting to know the
>> community!
>> >
>> > - Emily
>> >
>> >
>> >
>>
>


Re:

2020-07-10 Thread Kenneth Knowles
Welcome to dev@ !

On Fri, Jul 10, 2020 at 2:14 AM Maximilian Michels  wrote:

> Welcome Emily! Looking forward to your questions.
>
> Cheers,
> Max
>
> On 08.07.20 20:07, Emily Ye wrote:
> > Greetings, dev@beam! Just wanted to introduce myself - I'm a SWE at
> Google who will be contributing to Beam going forward. I'm pretty new to
> the data processing space but I'm excited to learn, and will probably be
> asking lots of questions here. Looking forward to getting to know the
> community!
> >
> > - Emily
> >
> >
> >
>


[GitHub] [beam-site] tvalentyn commented on pull request #605: Publish 2.23.0 release

2020-07-10 Thread GitBox


tvalentyn commented on pull request #605:
URL: https://github.com/apache/beam-site/pull/605#issuecomment-656791809


   r: @TheNeuralBit 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: Versioning published Java containers

2020-07-10 Thread Kenneth Knowles
To the initial question: I'm +1 on the rename. The container is primarily
something that the SDK should insert into the pipeline proto during
construction, and only user-facing in more specialized situations. Given
the state of Java and portability, it is a good time to get things named
properly and unambiguously. I think a brief announce to dev@ and user@ when
it happens is nice-to-have, but no need to give advance warning.

Kenn

On Fri, Jul 10, 2020 at 7:58 AM Kenneth Knowles  wrote:

> I believe Beam already has quite a few users that have forged ahead and
> used Java 11 with various runners, pre-portability. Mostly I believe the
> Java 11 limitations are with particular features (Schema codegen) and
> extensions/IOs/transitive deps.
>
> When it comes to the container, I'd be interested in looking at test
> coverage. The Flink & Spark portable ValidatesRunner suites use EMBEDDED
> environment, so they don't exercise the container. The first testing of the
> Java SDK harness container against the Python-based Universal Local Runner
> is in pull request now [1]. Are there other test suites to highlight? How
> hard would it be to run Flink & Spark against the container(s) too?
>
> Kenn
>
> [1] https://github.com/apache/beam/pull/11792 (despite the name
> ValidatesRunner, in this case it is validating both the runner and harness,
> since we don't have a compliance test suite for SDK harnesses)
>
> On Fri, Jul 10, 2020 at 7:54 AM Tyson Hamilton  wrote:
>
>> What do we consider 'ready'?
>>
>> Maybe the only required outstanding bugs are supporting the direct runner
>> (BEAM-10085), core tests (BEAM-10081), IO tests (BEAM-10084)  to start
>> with? Notably this would exclude failing tests like those for GCP core,
>> GCPIOs, Dataflow runner, Spark runner, Flink runner, Samza.
>>
>>
>> On Thu, Jul 9, 2020 at 4:44 PM Kyle Weaver  wrote:
>>
>>> My main question is, are we confident the Java 11 container is ready to
>>> release? AFAIK there are still a number of issues blocking full Java 11
>>> support (cf [1] ; not
>>> sure how many of these, if any, affect the SDK harness specifically though.)
>>>
>>> For comparison, we recently decided to stop publishing Go SDK containers
>>> until the Go SDK is considered mature [2]. In the meantime, those who want
>>> to use the Go SDK can build their own container images from source.
>>>
>>> Do we already have a Gradle task to build Java 11 containers? If not,
>>> this would be a good intermediate step, letting users opt-in to Java
>>> 11 without us overpromising support.
>>>
>>
>> We do not. From what I can tell, the build.gradele [1] for the Java
>> container is only for the one version. There is a docker file used for
>> Jenkins tests.
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/container/build.gradle
>>
>>
>>>
>> When we eventually do the renaming, we can add a note to CHANGES.md [3].
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-10090
>>> [2] https://issues.apache.org/jira/browse/BEAM-9685
>>> [3] https://github.com/apache/beam/blob/master/CHANGES.md
>>>
>>> On Thu, Jul 9, 2020 at 3:44 PM Emily Ye  wrote:
>>>
 Hi all,

 I'm getting ramped up on contributing and was looking into adding the
 Java 11 harness container to releases (
 https://issues.apache.org/jira/browse/BEAM-8106) - should I rename the
 current java container so we have two new images `beam_java8_sdk` and
 `beam_java11_sdk` or hold off on renaming? If we do rename it, what steps
 should I take to announce/document the change?

 Thanks,
 Emily

>>>


Season of Docs 2020 Proposal for Apache Beam (a)

2020-07-10 Thread Season of Docs
Below is a project proposal from a technical writer (bcc'd) who wants to
work with your organization on a Season of Docs project. Please assess the
proposal and ensure that you have a mentor to work with the technical
writer.

If you want to accept the proposal, please submit the technical writing
project to the Season of Docs program administrators. The project selection
form is at this link: . The form
is also available in the guide for organization administrators
.


The deadline for project selections is July 31, 2020 at 20:00 UTC. For
other program deadlines, please see the full timeline
 on the Season
of Docs website.

If you have any questions about the program, please email the Season of
Docs team at season-of-docs-supp...@googlegroups.com.

Best,
The Google Season of Docs team


Title: a Project length: Standard length (3 months)
Writer information *Name:* a
*Email:* hj27...@gmail.com


Writing experience: Experience 1:
*Title:* a
*Date:* a
*Description:* a
*Summary:* a

*Sample:* http://www.visionias.in/resources/value_added_material.php
Project Description a {{EXTRA16}} {{EXTRA17}}


Season of Docs 2020 Proposal for Apache Beam (Mahima Chowdhury)

2020-07-10 Thread Season of Docs
Below is a project proposal from a technical writer (bcc'd) who wants to
work with your organization on a Season of Docs project. Please assess the
proposal and ensure that you have a mentor to work with the technical
writer.

If you want to accept the proposal, please submit the technical writing
project to the Season of Docs program administrators. The project selection
form is at this link: . The form
is also available in the guide for organization administrators
.


The deadline for project selections is July 31, 2020 at 20:00 UTC. For
other program deadlines, please see the full timeline
 on the Season
of Docs website.

If you have any questions about the program, please email the Season of
Docs team at season-of-docs-supp...@googlegroups.com.

Best,
The Google Season of Docs team


Title: Getting Familiarized with the technicalities of Flink and Spark
Clusters with Beam Project length: Long running (5 months)
Writer information *Name:* Mahima Chowdhury
*Email:* mahiexplo...@gmail.com
*Résumé/CV:* https://www.linkedin.com/in/mahima-chowdhury-61a26611b/
*Additional information:* I desperately want to grow in the field of
technical writing.
Project Description I want my users to learn in detail about Apache Beam.
Along with the introductory section, my users will understand how will Beam
get integrate with FLINK and Spark clusters to improvise the Capability
Matrix. {{EXTRA16}} {{EXTRA17}}


Re: Versioning published Java containers

2020-07-10 Thread Kenneth Knowles
I believe Beam already has quite a few users that have forged ahead and
used Java 11 with various runners, pre-portability. Mostly I believe the
Java 11 limitations are with particular features (Schema codegen) and
extensions/IOs/transitive deps.

When it comes to the container, I'd be interested in looking at test
coverage. The Flink & Spark portable ValidatesRunner suites use EMBEDDED
environment, so they don't exercise the container. The first testing of the
Java SDK harness container against the Python-based Universal Local Runner
is in pull request now [1]. Are there other test suites to highlight? How
hard would it be to run Flink & Spark against the container(s) too?

Kenn

[1] https://github.com/apache/beam/pull/11792 (despite the name
ValidatesRunner, in this case it is validating both the runner and harness,
since we don't have a compliance test suite for SDK harnesses)

On Fri, Jul 10, 2020 at 7:54 AM Tyson Hamilton  wrote:

> What do we consider 'ready'?
>
> Maybe the only required outstanding bugs are supporting the direct runner
> (BEAM-10085), core tests (BEAM-10081), IO tests (BEAM-10084)  to start
> with? Notably this would exclude failing tests like those for GCP core,
> GCPIOs, Dataflow runner, Spark runner, Flink runner, Samza.
>
>
> On Thu, Jul 9, 2020 at 4:44 PM Kyle Weaver  wrote:
>
>> My main question is, are we confident the Java 11 container is ready to
>> release? AFAIK there are still a number of issues blocking full Java 11
>> support (cf [1] ; not
>> sure how many of these, if any, affect the SDK harness specifically though.)
>>
>> For comparison, we recently decided to stop publishing Go SDK containers
>> until the Go SDK is considered mature [2]. In the meantime, those who want
>> to use the Go SDK can build their own container images from source.
>>
>> Do we already have a Gradle task to build Java 11 containers? If not,
>> this would be a good intermediate step, letting users opt-in to Java
>> 11 without us overpromising support.
>>
>
> We do not. From what I can tell, the build.gradele [1] for the Java
> container is only for the one version. There is a docker file used for
> Jenkins tests.
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/container/build.gradle
>
>
>>
> When we eventually do the renaming, we can add a note to CHANGES.md [3].
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-10090
>> [2] https://issues.apache.org/jira/browse/BEAM-9685
>> [3] https://github.com/apache/beam/blob/master/CHANGES.md
>>
>> On Thu, Jul 9, 2020 at 3:44 PM Emily Ye  wrote:
>>
>>> Hi all,
>>>
>>> I'm getting ramped up on contributing and was looking into adding the
>>> Java 11 harness container to releases (
>>> https://issues.apache.org/jira/browse/BEAM-8106) - should I rename the
>>> current java container so we have two new images `beam_java8_sdk` and
>>> `beam_java11_sdk` or hold off on renaming? If we do rename it, what steps
>>> should I take to announce/document the change?
>>>
>>> Thanks,
>>> Emily
>>>
>>


Re: Versioning published Java containers

2020-07-10 Thread Tyson Hamilton
What do we consider 'ready'?

Maybe the only required outstanding bugs are supporting the direct runner
(BEAM-10085), core tests (BEAM-10081), IO tests (BEAM-10084)  to start
with? Notably this would exclude failing tests like those for GCP core,
GCPIOs, Dataflow runner, Spark runner, Flink runner, Samza.


On Thu, Jul 9, 2020 at 4:44 PM Kyle Weaver  wrote:

> My main question is, are we confident the Java 11 container is ready to
> release? AFAIK there are still a number of issues blocking full Java 11
> support (cf [1] ; not
> sure how many of these, if any, affect the SDK harness specifically though.)
>
> For comparison, we recently decided to stop publishing Go SDK containers
> until the Go SDK is considered mature [2]. In the meantime, those who want
> to use the Go SDK can build their own container images from source.
>
> Do we already have a Gradle task to build Java 11 containers? If not, this
> would be a good intermediate step, letting users opt-in to Java 11 without
> us overpromising support.
>

We do not. From what I can tell, the build.gradele [1] for the Java
container is only for the one version. There is a docker file used for
Jenkins tests.

[1]
https://github.com/apache/beam/blob/master/sdks/java/container/build.gradle


>
When we eventually do the renaming, we can add a note to CHANGES.md [3].
>
> [1] https://issues.apache.org/jira/browse/BEAM-10090
> [2] https://issues.apache.org/jira/browse/BEAM-9685
> [3] https://github.com/apache/beam/blob/master/CHANGES.md
>
> On Thu, Jul 9, 2020 at 3:44 PM Emily Ye  wrote:
>
>> Hi all,
>>
>> I'm getting ramped up on contributing and was looking into adding the
>> Java 11 harness container to releases (
>> https://issues.apache.org/jira/browse/BEAM-8106) - should I rename the
>> current java container so we have two new images `beam_java8_sdk` and
>> `beam_java11_sdk` or hold off on renaming? If we do rename it, what steps
>> should I take to announce/document the change?
>>
>> Thanks,
>> Emily
>>
>


Season of Docs 2020 Proposal for Apache Beam (Shivkumar Tiwari)

2020-07-10 Thread Season of Docs
Below is a project proposal from a technical writer (bcc'd) who wants to
work with your organization on a Season of Docs project. Please assess the
proposal and ensure that you have a mentor to work with the technical
writer.

If you want to accept the proposal, please submit the technical writing
project to the Season of Docs program administrators. The project selection
form is at this link: . The form
is also available in the guide for organization administrators
.


The deadline for project selections is July 31, 2020 at 20:00 UTC. For
other program deadlines, please see the full timeline
 on the Season
of Docs website.

If you have any questions about the program, please email the Season of
Docs team at season-of-docs-supp...@googlegroups.com.

Best,
The Google Season of Docs team


Title: Open Source organization maintains the fluctuations of any kind of
developing projects . Usually related to the innovative skills and ideas in
that project generally . Project length: Standard length (3 months)
Writer information *Name:* Shivkumar Tiwari
*Email:* tsanket9...@gmail.com
*Résumé/CV:* https://www.linkedin.com/in/shivkumar-tiwari-746987191
*Sample:* https://www.linkedin.com/in/shivkumar-tiwari-746987191
Project Description Technology is the key to success that's why people
needs the random innovations in daily life. By writing something techniques
isnt be possible to explore someone. {{EXTRA16}} {{EXTRA17}}


Re: Finer-grained test runs?

2020-07-10 Thread Kenneth Knowles
On Thu, Jul 9, 2020 at 1:44 PM Robert Bradshaw  wrote:

> I wonder how hard it would be to track greenness and flakiness at the
> level of gradle project (or even lower), viewed hierarchically.
>

Looks like this is part of the Gradle Enterprise Tests Dashboard offering:
https://gradle.com/blog/flaky-tests/

Kenn

> Recall my (non-binding) starting point guessing at what tests should or
> should not run in some scenarios: (this tangent is just about the third
> one, where I explicitly said maybe we run all the same tests and then we
> want to focus on separating signals as Luke pointed out)
> >
> > > - changing an IO or runner would not trigger the 20 minutes of core
> SDK tests
> > > - changing a runner would not trigger the long IO local integration
> tests
> > > - changing the core SDK could potentially not run as many tests in
> presubmit, but maybe it would and they would be separately reported results
> with clear flakiness signal
> >
> > And let's consider even more concrete examples:
> >
> >  - when changing a Fn API proto, how important is it to run
> RabbitMqIOTest?
> >  - when changing JdbcIO, how important is it to run the Java SDK
> needsRunnerTests? RabbitMqIOTest?
> >  - when changing the FlinkRunner, how important is it to make sure that
> Nexmark queries still match their models when run on direct runner?
> >
> > I chose these examples to all have zero value, of course. And I've
> deliberately included an example of a core change and a leaf test. Not all
> (core change, leaf test) pairs are equally important. The vast majority of
> all tests we run are literally unable to be affected by the changes
> triggering the test. So that's why enabling Gradle cache or using a plugin
> like Brian found could help part of the issue, but not the whole issue,
> again as Luke reminded.
>
> For (2) and (3), I would hope that the build dependency graph could
> exclude them. You're right about (1) (and I've hit that countless
> times), but would rather err on the side of accidentally running too
> many tests than not enough. If we make manual edits to what can be
> inferred by the build graph, let's make it a blacklist rather than an
> allow list to avoid accidental lost coverage.
>
> > We make these tradeoffs all the time, of course, via putting some tests
> in *IT and postCommit runs and some in *Test, implicitly preCommit. But I
> am imagining a future where we can decouple the test suite definitions
> (very stable, not depending on the project context) from the decision of
> where and when to run them (less stable, changing as the project changes).
> >
> > My assumption is that the project will only grow and all these problems
> (flakiness, runtime, false coupling) will continue to get worse. I raised
> this now so we could consider what is a steady state approach that could
> scale, before it becomes an emergency. I take it as a given that it is
> harder to change culture than it is to change infra/code, so I am not
> considering any possibility of more attention to flaky tests or more
> attention to testing the core properly or more attention to making tests
> snappy or more careful consideration of *IT and *Test. (unless we build
> infra that forces more attention to these things)
> >
> > Incidentally, SQL is not actually fully factored out. If you edit SQL it
> runs a limited subset defined by :sqlPreCommit. If you edit core, then
> :javaPreCommit still includes SQL tests.
>
> I think running SQL tests when you edit core is not actually that bad.
> Possibly better than not running any of them. (Maybe, as cost becomes
> more of a concern, adding the notion of "smoke tests" that are a cheap
> subset run when upstream projects change would be a good compromise.)
>


Re: KinesisIO Tests - are they run anywhere?

2020-07-10 Thread Alexey Romanenko
I think that we should get back to this question since that time we have more 
and more AWS-related IO connectors (mostly in Java SDK afaik).

It would be great to have a dedicated Beam’s credentials to run all our 
AWS-releasted ITs against real AWS instance, but till then, I’m +1 to run such 
tests against 3rd party implementations, for example, “localstack”, as most 
comprehensive one. Of course we can observe some potential discrepancy in 
behaviour between real AWS and other implementations, but if it’s not principal 
things then it should not be a stopper. I believe that regular running ITs, 
especially for IO connectors, is a very important. 

> On 9 Jul 2020, at 22:18, Luke Cwik  wrote:
> 
> It has come up a few times[1, 2, 3, 4] and there have also been a few 
> comments over time about whether someone could donate AWS resources to the 
> project.
> 
> 1: https://issues.apache.org/jira/browse/BEAM-601 
> 
> 2: https://issues.apache.org/jira/browse/BEAM-3373 
> 
> 3: https://issues.apache.org/jira/browse/BEAM-3550 
> 
> 4: https://issues.apache.org/jira/browse/BEAM-3032 
> 
> On Thu, Jul 9, 2020 at 1:02 PM Mani Kolbe  > wrote:
> Have you guys considered using localstack to run AWS service based 
> integration tests?
> 
> https://github.com/localstack/localstack 
> 
> 
> On Thu, 9 Jul, 2020, 5:25 PM Piotr Szuberski,  > wrote:
> Yeah, I meant KinesisIOIT tests. I'll do the same with the cross-language it 
> tests then. Thanks for your reply :)
> 
> On 2020/07/08 17:13:11, Alexey Romanenko  > wrote: 
> > If you mean Java KinesisIO tests, then unit tests are running on Jenkins 
> > [1] and ITs are not running since it requires AWS credentials that we don’t 
> > have dedicated to Beam for the moment.
> > 
> > In the same time, you can run KinesisIOIT with your own credentials, like 
> > we do in Talend (a company that I work for).
> > 
> > [1] 
> > https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12209/testReport/org.apache.beam.sdk.io.kinesis/
> >  
> > 
> >  
> >  >  
> > >
> > 
> > > On 8 Jul 2020, at 13:11, Piotr Szuberski  > > > wrote:
> > > 
> > > I'm writing KinesisIO external transform with python wrapper and I found 
> > > that the tests aren't executed anywhere in Jenkins. Am I wrong or there 
> > > is a reason for that?
> > 
> > 



Re:

2020-07-10 Thread Maximilian Michels

Welcome Emily! Looking forward to your questions.

Cheers,
Max

On 08.07.20 20:07, Emily Ye wrote:

Greetings, dev@beam! Just wanted to introduce myself - I'm a SWE at Google who 
will be contributing to Beam going forward. I'm pretty new to the data 
processing space but I'm excited to learn, and will probably be asking lots of 
questions here. Looking forward to getting to know the community!

- Emily