Re: RequiresStableInput on Spark runner

2020-07-08 Thread Jozef Vilcek
My last question was more towards the graph translation for batch mode. Should DoFn with @RequiresStableInput be translated/expanded in some specific way (e.g. DoFn -> Reshuffle + DoFn) or is it not needed for batch? Most runners fail in the presence of @RequiresStableInput for both batch and

Errorprone plugin fails for release branches <2.22.0

2020-07-08 Thread Alexey Romanenko
Hello, Some days ago I noticed that I can’t build the project from old release branches . For example, I wanted to build and run Spark Job Server from “release-2.20.0” branch and it failed: ./gradlew :runners:spark:job-server:runShadow —stacktrace * Exception is:

Re: RequiresStableInput on Spark runner

2020-07-08 Thread Jozef Vilcek
Would it then be safe to enable the same behavior for Spark batch? I can create a JIRA and patch for this, if there is no other reason to not to do so On Wed, Jul 8, 2020 at 11:51 AM Maximilian Michels wrote: > Correct, for batch we rely on re-running the entire job which will > produce stable

Beam Dependency Check Report (2020-07-08)

2020-07-08 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue cachetools 3.1.1 4.1.1 2019-12-23

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-08 Thread Maximilian Michels
Hi Alexey, I also came across this issue when building a custom Beam version. I applied the same fix (https://github.com/apache/beam/pull/11527) which you have mentioned. It appears that the Maven dependencies changed or are no longer available which causes the missing class files. +1 for

Re: RequiresStableInput on Spark runner

2020-07-08 Thread Maximilian Michels
Correct, for batch we rely on re-running the entire job which will produce stable input within each run. For streaming, the Flink Runner buffers all input to a @RequiresStableInput DoFn until a checkpoint is complete, only then it processes the buffered data. Dataflow effectively does the

NanosInstant not being recognised by BigQueryIO.Write

2020-07-08 Thread Robert.Butcher
Hi All, I am posting this to the dev (as opposed to user channel) as I believe it will be of interest to the those working on either Schemas or BigQuery I have a pipeline based on BEAM 2.22 that is ingesting data into BigQuery. Internally I am using protobuf for my domain model and the

KinesisIO Tests - are they run anywhere?

2020-07-08 Thread Piotr Szuberski
I'm writing KinesisIO external transform with python wrapper and I found that the tests aren't executed anywhere in Jenkins. Am I wrong or there is a reason for that?

Re: beam submit TFX on yarn

2020-07-08 Thread Kyle Weaver
Beam Python does not yet work with Spark on yarn. See https://issues.apache.org/jira/browse/BEAM-8970 for details. On Tue, Jul 7, 2020 at 8:52 PM sxqjq wrote: > > I forget, java can use spark-submit commit,but I use Python language > > > > - 原始邮件 - > > > *发件人:*sxqjq > >

Re: [PROPOSAL] Preparing for Beam 2.23.0 release

2020-07-08 Thread Kyle Weaver
> I may need help with a Samza ValidatesRunner failure [1]. It has been failing since at least June 24 [2]. Looks like a duplicate of https://issues.apache.org/jira/browse/BEAM-10025. > 1. Did this issue come up during earlier releases? Yes, this affected the 2.21 and 2.22 releases. tl;dr it

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-08 Thread Ismaël Mejía
I still don't understand how this happened. Was the dependency hosted in other place? Dependencies CAN NOT be removed from central to avoid these issues. https://central.sonatype.org/articles/2014/Feb/06/can-i-change-a-component-on-central/ The question is where was this dependency coming from?

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-08 Thread Kenneth Knowles
I believe the hosting is https://plugins.gradle.org/m2/ On Wed, Jul 8, 2020 at 12:33 PM Ismaël Mejía wrote: > I still don't understand how this happened. Was the dependency hosted > in other place? > > Dependencies CAN NOT be removed from central to avoid these issues. > >

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-08 Thread Alexey Romanenko
Hi Max, I’m +1 for back porting as well but that seems quite complicated since we distribute release source code from https://archive.apache.org/ Perhaps, we should just warn users about this issue and how to workaround it. Any other ideas? > On 8 Jul 2020, at 11:46, Maximilian Michels wrote:

Re: [PROPOSAL] Preparing for Beam 2.23.0 release

2020-07-08 Thread Valentyn Tymofieiev
Thank you, Kyle! On Wed, Jul 8, 2020 at 10:03 AM Kyle Weaver wrote: > > I may need help with a Samza ValidatesRunner failure [1]. It has been > failing since at least June 24 [2]. > > Looks like a duplicate of https://issues.apache.org/jira/browse/BEAM-10025 > . > > > 1. Did this issue come up

Re: KinesisIO Tests - are they run anywhere?

2020-07-08 Thread Alexey Romanenko
If you mean Java KinesisIO tests, then unit tests are running on Jenkins [1] and ITs are not running since it requires AWS credentials that we don’t have dedicated to Beam for the moment. In the same time, you can run KinesisIOIT with your own credentials, like we do in Talend (a company that

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-08 Thread Kenneth Knowles
On Wed, Jul 8, 2020 at 12:07 PM Kyle Weaver wrote: > > To fix on previous release branches, we would need to make a new > release, is it not? Since hashes would change.. > > Would it be alright to patch the release branches on Github and leave the > released source as-is? Github release branches

Season of Docs Interest

2020-07-08 Thread Sharon Lin
Hi Aizhamal, I'm a 4th year bachelors student at MIT studying computer science, and I'm interested in working with Apache Beam for Season of Docs! I recognize that it's close to the application deadline, but I'm an avid user of Apache Spark and would really love to help with documenting tools for

Beam Summit Status Report - 7/8

2020-07-08 Thread Brittany Hermann
Hi folks, I wanted to provide you with the Beam Summit Status report from today's meeting. If you would like to join the next public meeting on Wednesday, July 22nd at 11:30 AM PST please let me know and I will send a calendar invite over to you! Also don't forget to register for the Summit

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-08 Thread Pablo Estrada
Ah that's annoying that a dependency would be removed from maven. I thought that was not meant to happen? This must be an issue happening for many other projects... Why is errorprone a dependency anyway? To fix on previous release branches, we would need to make a new release, is it not? Since

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-08 Thread Kyle Weaver
> To fix on previous release branches, we would need to make a new release, is it not? Since hashes would change.. Would it be alright to patch the release branches on Github and leave the released source as-is? Github release branches themselves aren't release artifacts, so I think it should be

[no subject]

2020-07-08 Thread Emily Ye
Greetings, dev@beam! Just wanted to introduce myself - I'm a SWE at Google who will be contributing to Beam going forward. I'm pretty new to the data processing space but I'm excited to learn, and will probably be asking lots of questions here. Looking forward to getting to know the community!

Re: Finer-grained test runs?

2020-07-08 Thread Kenneth Knowles
That's a good start. It is new enough and with few enough commits that I'd want to do some thorough experimentation. Our build is complex enough with a lot of ad hoc coding that we might end up maintaining whatever we choose... In my ideal scenario the list of "what else to test" would be

Re: Request for Java PR review

2020-07-08 Thread Robert Bradshaw
Yeah, the fact that not everyone can see suggested reviewers is annoying. Mostly I just wanted to call out that if you have a PR and haven't gotten feedback on it, it's totally kosher to as someone specifically to be a reviewer, and this can often get the ball rolling quicker. (Pinging the list

Finer-grained test runs?

2020-07-08 Thread Kenneth Knowles
Hi all, I wanted to start a discussion about getting finer grained test execution more focused on particular artifacts/modules. In particular, I want to gather the downsides and impossibilities. So I will make a proposal that people can disagree with easily. Context: job_PreCommit_Java is a

Re: Finer-grained test runs?

2020-07-08 Thread Brian Hulette
> We could have one "test the things" Jenkins job if the underlying tool (Gradle) could resolve what needs to be run. I think this would be much better. Otherwise it seems our Jenkins definitions are just duplicating information that's already stored in the build.gradle files which seems

Re: Finer-grained test runs?

2020-07-08 Thread Luke Cwik
I'm not sure that breaking it up will be significantly faster since each module needs to build its ancestors and run tests of itself and all of its descendants which isn't a trivial amount of work. We have only so many executors and with the increased number of jobs, won't we just be waiting for

Re: Finer-grained test runs?

2020-07-08 Thread Robert Bradshaw
On Wed, Jul 8, 2020 at 4:44 PM Luke Cwik wrote: > > I'm not sure that breaking it up will be significantly faster since each > module needs to build its ancestors and run tests of itself and all of its > descendants which isn't a trivial amount of work. We have only so many > executors and

Re: Request for Java PR review

2020-07-08 Thread Rui Wang
I didn't hear that has been changed so assume it's still only committers that can see suggested reviewers, thus picking up someone based on the source code history could be the feasible solution for non-committers. An improvement though could be when picking up someone from the history, you can

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-07-08 Thread Robert Bradshaw
OK, I'm +0 on this change. Using the PTransform as an element is probably better than duplicating the full API on another interface, and think it's worth getting this ublocked. This will require a Read2 if we have to add options in a upgrade-compatible way. On Tue, Jul 7, 2020 at 3:19 PM Luke

Re: Finer-grained test runs?

2020-07-08 Thread Kenneth Knowles
I like your use of "ancestor" and "descendant". I will adopt it. On Wed, Jul 8, 2020 at 4:53 PM Robert Bradshaw wrote: > On Wed, Jul 8, 2020 at 4:44 PM Luke Cwik wrote: > > > > I'm not sure that breaking it up will be significantly faster since each > module needs to build its ancestors and