Re: Jenkins jobs not running for my PR 10438

2020-01-13 Thread Yifan Zou
done. On Sun, Jan 12, 2020 at 6:27 PM Tomo Suzuki wrote: > Hi Beam committers, > > Four Jenkins jobs did not report back for this PR > https://github.com/apache/beam/pull/10554 . > Can somebody trigger them? > > On Fri, Jan 10, 2020 at 4:51 PM Andrew Pilloud > wrote: > > > > Done. > > > > On

Beam Dependency Check Report (2020-01-13)

2020-01-13 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue cachetools 3.1.1 4.0.0 2019-12-23

Jenkins job execution policy

2020-01-13 Thread Michał Walenia
Hi, I wanted to decouple the conversation about solutions to the issue from job execution requests. We have 131 open PRs right now and 64 committers with job running privileges. From what I counted, more than 80 of those PRs are not authored by committers. I think that having committers answer

Re: outputWithTimestamp

2020-01-13 Thread Aaron Dixon
Reuven, thank you much for your help and the clarity here, it's very helpful.. Per your solution #2 -- This approach makes sense, seems semantically right, and something I'll explore when the timer.withOutputTimetstamp(t) releases. Just for clarity, there is no other way in Beam

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Reuven Lax
I don't think that should be the case. Also SchemaCoder will automatically set the UUID for such logical types. On Mon, Jan 13, 2020 at 8:24 AM Alex Van Boxel wrote: > OK, I've rechecked everything and eventually found the problem. The > problem is when you use a LogicalType backed back a Row,

Re: Ask about beam pull requests

2020-01-13 Thread Pablo Estrada
... what that means is that you can tag me on github, and I'll take a look, yes : ) I'm 'pabloem'. On Mon, Jan 13, 2020 at 9:59 AM Pablo Estrada wrote: > I reviewed the first PR, so I'm happy to review others. > > On Mon, Jan 13, 2020 at 9:42 AM Robert Bradshaw > wrote: > >> One thing you

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Reuven Lax
SchemaCoder today recursively sets UUIDs for all schemas, including logical types, in setSchemaIds. Is it possible that your changes modified that logic somehow? On Mon, Jan 13, 2020 at 9:39 AM Alex Van Boxel wrote: > This is the stacktrace: > > > java.lang.IllegalStateException at >

Re: Jenkins jobs not running for my PR 10438

2020-01-13 Thread Tomo Suzuki
Thanks Yifan (but Java Precommit is still missing). Can somebody run "Run Java PreCommit" on https://github.com/apache/beam/pull/10554? On Mon, Jan 13, 2020 at 2:59 AM Yifan Zou wrote: > > done. > > On Sun, Jan 12, 2020 at 6:27 PM Tomo Suzuki wrote: >> >> Hi Beam committers, >> >> Four Jenkins

Re: Beam Summit North America 2019 - recordings

2020-01-13 Thread Pablo Estrada
Thanks Matthias! On Sun, Jan 12, 2020 at 7:51 AM Matthias Baetens wrote: > Hi everyone, > > It's our pleasure to share the recordings from the Beam Summit North > America 2019. > Please find them in the YouTube playlist >

[RESULT] [VOTE] Vendored Dependencies Release

2020-01-13 Thread Luke Cwik
[RESULT] [VOTE] Vendored Dependencies Release I'm happy to announce that we have unanimously approved this release. There are 6 approving votes, 4 of which are binding: * Luke Cwik * Pablo Estrada * Ahmet Altay * Kenneth Knowles There are no disapproving votes. Thanks everyone! On 2020/01/11

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
This is the stacktrace: java.lang.IllegalStateException at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:491) at org.apache.beam.sdk.coders.RowCoderGenerator.getCoder(RowCoderGenerator.java:380) at

Re: outputWithTimestamp

2020-01-13 Thread Reuven Lax
Correct. This API is merged into Beam, so should be included in the next Beam release. On Mon, Jan 13, 2020 at 4:00 AM Aaron Dixon wrote: > Reuven, thank you much for your help and the clarity here, it's very > helpful.. > > Per your solution #2 -- This approach makes sense, seems semantically

Ask about beam pull requests

2020-01-13 Thread Keunhyun Oh
I want to make pull request about BEAM-9094 ( https://issues.apache.org/jira/browse/BEAM-9094) My tree is https://github.com/ocworld/beam/tree/BEAM-9094-add-aws-s3-options When trying to create a pull request issue, It is needed for me to assign reviewers. Who can review my request?

Re: Ask about beam pull requests

2020-01-13 Thread Robert Bradshaw
One thing you could do is ask for a history [1] of the file and see if there are any possible candidates (e.g. apache beam comitters [2]). [1] https://github.com/ocworld/beam/blame/259f6174ce52e6317a5b4fe7ed3a126153d3/sdks/python/apache_beam/io/aws/clients/s3/boto3_client.py [2]

Failing Java PostCommit for Dataflow runner

2020-01-13 Thread Kirill Kozlov
Hello everyone! I have noticed that Jenkins tests for Dataflow runner [1] are failing with a runtime exception. It looks like the issue originated here [2], failed Dataflow job [3]. We should look into fixing it. Failing test: :runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest »

Re: Jenkins jobs not running for my PR 10438

2020-01-13 Thread Ismaël Mejía
done On Mon, Jan 13, 2020 at 2:39 PM Yoshiki Obata wrote: > Hi Beam committers > > It would be appreciated if anyone could trigger python precommit job to > this PR: > https://github.com/apache/beam/pull/10141 > > Regards, > Yoshiki >

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
OK, I've rechecked everything and eventually found the problem. The problem is when you use a LogicalType backed back a Row, then the UUID needs to be set to make it work. (this is the case for Proto based Timestamps). I'll create a fix. _/ _/ Alex Van Boxel On Mon, Jan 13, 2020 at 8:36 AM

Re: Jenkins jobs not running for my PR 10438

2020-01-13 Thread Yoshiki Obata
Hi Beam committers It would be appreciated if anyone could trigger python precommit job to this PR: https://github.com/apache/beam/pull/10141 Regards, Yoshiki

NYC ? (or more generally East Coast)

2020-01-13 Thread Austin Bennett
Hi Devs and Users, We are looking for speakers for future Meetups and Events. Who is building cool things with Beam? We are looking at hosting a Meetup at Spotify in February, and ideally keep some meetups going throughout the year. For this to occur, we need to hear about what people are

Re: Failing Java PostCommit for Dataflow runner

2020-01-13 Thread Luke Cwik
This is being tracked in BEAM-9083 On Mon, Jan 13, 2020 at 11:23 AM Boyuan Zhang wrote: > Thanks Kirill! I'm going to look into it. > > On Mon, Jan 13, 2020 at 11:18 AM Kirill Kozlov > wrote: > >> Hello everyone! >> >> I have noticed that Jenkins tests for Dataflow runner [1] are failing >>

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Rui Wang
If it indeed happened as you have described, I will be very interested in the expected behaviour. Something I remembered before: the trigger condition meets just gives the runner/engine "permission" to fire, but runner/engine may not fire immediately. But I don't know if the engine/runner will

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
So I think the following happens: 1. the schema tree is initialized at construction time. The tree get serialized and send to the workers 2. the workers deserialize the tree, but as the Timestamp logical type have a logical type with a *static* schema the schema will be

No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
I have the following trigger: .apply(Window .configure() .triggering(AfterWatermark .pastEndOfWindow() .withEarlyFirings(AfterPane .elementCountAtLeast(1))) .accumulatingFiredPanes() .withAllowedLateness(Duration.ZERO) But in Dataflow

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Luke Cwik
I would have expected an empty on time pane since the default on time behavior is FIRE_ALWAYS. On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon wrote: > Can anyone confirm? > > This is intermittent. Some (it seems, sparse) windows don't get an ON_TIME > firing after watermark. Is this a bug or is

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
Yes. Using calendar day-based windows and watermark is completely caught up to today ... calendar window ends several days ago. I got EARLY panes for each element but never ON_TIME pane. On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik wrote: > Is the watermark advancing past the end of the window? >

Re: BigQueryUtils improvements for Avro Bytes / Timestamp (millis)

2020-01-13 Thread Ryan Berti
Hello, I've got a PR that affects the java gcp component, specifically BigQueryUtils. Can anyone help me with a review? I've tagged the owner of the component on the PR but haven't heard anything for a week, so I figured I'd send an e-mail to this list.

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Brian Hulette
I guess these are the first logical types we've defined with a base type of row. It does seem reasonable that a static schema for a logical type could have some fixed id, but it feels odd to have a fixed UUID, it would be nice if we could give the schema some meaningful static identifier. I think

Re: NYC ? (or more generally East Coast)

2020-01-13 Thread Suneel Marthi
I can do talks in either DC or NYC meetups. I can coordinate with CapitalOne to see if they would be willing to host the DC meetup. On Mon, Jan 13, 2020 at 4:02 PM Austin Bennett wrote: > Hi Devs and Users, > > We are looking for speakers for future Meetups and Events. Who is > building cool

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
The window is not empty fwiw; it has elements; I get an early firing pane for the window but well after the watermark passes there is no ON_TIME pane. Would this be a bug in Dataflow? Seems fundamental, so I'm concerned perhaps the Beam spec doesn't obligate ON_TIME firings? On Mon, Jan 13,

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Luke Cwik
Is the watermark advancing past the end of the window? On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon wrote: > The window is not empty fwiw; it has elements; I get an early firing pane > for the window but well after the watermark passes there is no ON_TIME > pane. Would this be a bug in Dataflow?

Re: Failing Java PostCommit for Dataflow runner

2020-01-13 Thread Kirill Kozlov
Thanks for taking care of this! On Mon, Jan 13, 2020 at 2:00 PM Boyuan Zhang wrote: > This problem is addressed by PR10564. Now all affected tests are back to > green. > > On Mon, Jan 13, 2020 at 1:11 PM Luke Cwik wrote: > >> This is being tracked in BEAM-9083 >> >> On Mon, Jan 13, 2020 at

[PROPOSAL] Leveraging SQL TableProviders for Cross-Language IOs

2020-01-13 Thread Brian Hulette
Hi everyone, I have a proposal that I think can unify two problem sets: 1) adding more IOs for Beam SQL, and 2) making more (Row-based) Java IOs available in Python as cross-language transforms The basic idea is to create a single cross-language transform that exposes all Beam SQL IOs via the

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
Any confirmation on this from anyone? Whether per Beam spec, runners are obligated to send ON_TIME panes for AfterWatermark triggers? I'm stuck because this seems fundamental, so it's hard to imagine this is a Dataflow bug, but OTOH it's also hard to imagine that trigger specs like AfterWatermark

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
Fix in this PR: [BEAM-9113] Fix serialization proto logical types https://github.com/apache/beam/pull/10569 or we all agree to *promote* the logical types to top-level logical types (as described in the design document, see ticket): [BEAM-9037] Instant and duration as logical type

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
Can anyone confirm? This is intermittent. Some (it seems, sparse) windows don't get an ON_TIME firing after watermark. Is this a bug or is there a reason to not expect ON_TIME firings for every window? On Mon, Jan 13, 2020 at 3:47 PM Rui Wang wrote: > If it indeed happened as you have

Re: Python IO Connector

2020-01-13 Thread Brian Hulette
Regarding cross-language and Beam rows (and SQL!) - I have a PR up [1] that adds an example script for using Beam's SqlTransform in Python by leveraging the portable row coder. Unfortunately I got stalled figuring out how to build/stage the Java artifacts for the SQL extensions so it hasn't been

Re: Jenkins jobs not running for my PR 10438

2020-01-13 Thread Tomo Suzuki
Thank you, Mark and Ismaël. On Mon, Jan 13, 2020 at 2:34 PM Mark Liu wrote: > > done > > On Mon, Jan 13, 2020 at 8:03 AM Tomo Suzuki wrote: >> >> Thanks Yifan (but Java Precommit is still missing). >> Can somebody run "Run Java PreCommit" on >> https://github.com/apache/beam/pull/10554? >> >>

Re: Jenkins jobs not running for my PR 10438

2020-01-13 Thread Mark Liu
done On Mon, Jan 13, 2020 at 8:03 AM Tomo Suzuki wrote: > Thanks Yifan (but Java Precommit is still missing). > Can somebody run "Run Java PreCommit" on > https://github.com/apache/beam/pull/10554? > > > On Mon, Jan 13, 2020 at 2:59 AM Yifan Zou wrote: > > > > done. > > > > On Sun, Jan 12,

Re: Failing Java PostCommit for Dataflow runner

2020-01-13 Thread Boyuan Zhang
This problem is addressed by PR10564. Now all affected tests are back to green. On Mon, Jan 13, 2020 at 1:11 PM Luke Cwik wrote: > This is being tracked in BEAM-9083 > > On Mon, Jan 13, 2020 at 11:23 AM Boyuan Zhang wrote: > >> Thanks Kirill! I'm going to look into it. >> >> On Mon, Jan 13,

Re: Go SplittableDoFn prototype and proposed changes

2020-01-13 Thread Luke Cwik
Thanks for the update and I agree with the points that you have made. On Fri, Jan 10, 2020 at 5:58 PM Robert Burke wrote: > Thank you for sharing Daniel! > > Resolving SplittableDoFns for the Go SDK even just as far as initial > splitting will take the SDK that much closer to exiting its

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
It's indeed the first Logical identifier with Row base type. The UUID is generated from the name of the class, but doing it in code (from a string) you need to create bytes from the string, then a UUID. _/ _/ Alex Van Boxel On Mon, Jan 13, 2020 at 10:40 PM Brian Hulette wrote: > I guess

[VOTE] Release 2.18.0, release candidate #1

2020-01-13 Thread Udi Meiri
Hi everyone, Please review and vote on the release candidate #3 for the version 1.2.3, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], *

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Udi Meiri
The most important gain would be compatibility with Google internal code. TLDR: I don't expect non-Googlers to fix pytype issues in Beam, nor would they have access to internal code that is validated against pytype with Beam. Pytype seems to detect attribute errors that mypy has not, so it acts

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Ahmet Altay
> The most important gain would be compatibility with Google internal code. I would like to clarify this. This refers to users of Beam who by default are using pytype as part of the toolchain. Even though they are internal to a one single company and not vocal on Beam, they still represent a large

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Kenneth Knowles
On my phone, so I can't grab the jira so easily, but quickly: EARLY panes are "race condition equivalent" to ON_TIME panes. The early panes consume all the pending elements then the on time pane is "empty". This is WAI if it is what is causing it. You need to explicitly set

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Chad Dombrova
> > Pytype seems to detect attribute errors that mypy has not, so it acts as a > kind-of linter in this case. > Examples: > > https://github.com/apache/beam/pull/10528/files#diff-0cb34b4622b0b7d7256d28b1ee1d52fc > >

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Chad Dombrova
> > > I agree with focusing one mypy for now, but I would propose soon after, > or in parallel if it will be different folks, to work on pytype and enable > it as a first class citizen similar to mypy. If there will be a large delta > between the two then we can decide on what to do next. > > If

Re: [VOTE] Release 2.18.0, release candidate #1

2020-01-13 Thread Valentyn Tymofieiev
There are some issues in this message, part of the message is still a template (1.2.3, TODO, MAVEN_VERSION). Before I noticed these issues, I ran a few Batch and Streaming Python 3.7 pipelines using Direct and Dataflow runners, and they all succeeded. On Mon, Jan 13, 2020 at 4:09 PM Udi Meiri

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Luke Cwik
I would rather we focus on doing well with one type checker and it seems that mypy is significantly more popular than pytype so its more natural for users. I would support pytype if it covered more PEPs and was the newer and upcoming thing but that doesn't seem to be the case. On Sun, Jan 12,

Re: Jenkins job execution policy

2020-01-13 Thread Luke Cwik
I'm for going back to the status quo where anyone's PR ran the tests automatically or to the suggestion where users marked as contributors had their tests run automatically (with the documentation update about how link your github/jira accounts). On Mon, Jan 13, 2020 at 2:45 AM Michał Walenia

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Aaron Dixon
Kenn, thank you! There is OnTimeBehavior (default FIRE_ALWAYS) and ClosingBehavior (default FIRE_IF_NON_EMPTY). Given that OnTimeBehavior is always-fire, shouldn't I see empty ON_TIME panes? Since my lateness config is 0, I'm going to try ClosingBehavior = FIRE_ALWAYS and see if I can rely on

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Kenneth Knowles
Looking at this from the outside, it seems like mypy is the obvious choice. Also running pytype could potentially be informative in some cases but only if there is a specific gap. What about maintenance/governance of the two projects? Kenn On Sun, Jan 12, 2020 at 7:48 PM Chad Dombrova wrote: >

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Kyle Weaver
Udi, what would we gain by using pytype? Also, has anyone tried running pytype against Beam? If it's not too much trouble, it might be helpful to diff the pytype and mypy results to get a feel for exactly how big the discrepancy is. On Mon, Jan 13, 2020 at 3:26 PM Kenneth Knowles wrote: >

Re: [PROPOSAL] Leveraging SQL TableProviders for Cross-Language IOs

2020-01-13 Thread Chamikara Jayalath
Thanks Brian. Added some comments. On Mon, Jan 13, 2020 at 2:25 PM Brian Hulette wrote: > Hi everyone, > I have a proposal that I think can unify two problem sets: > 1) adding more IOs for Beam SQL, and > 2) making more (Row-based) Java IOs available in Python as > cross-language transforms

Re: [DISCUSS] Python static type checkers

2020-01-13 Thread Robert Bradshaw
On Mon, Jan 13, 2020 at 5:34 PM Chad Dombrova wrote: >> >> Pytype seems to detect attribute errors that mypy has not, so it acts as a >> kind-of linter in this case. >> Examples: >> https://github.com/apache/beam/pull/10528/files#diff-0cb34b4622b0b7d7256d28b1ee1d52fc >>

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Kenneth Knowles
This sounds like a bug, as described. Here's the logic, shared by all runners: https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java#L958 Regarding "race condition equivalent" I mean that when you have an early trigger set up

Re: No AfterWatermark firings in Dataflow

2020-01-13 Thread Robert Bradshaw
I think AfterWatermark in particular should *alway* produce an ON_TIME pane, regardless of whether there were early panes. (It's less clear with non-watermark triggers like after count or processing time.) This makes it feel like the on time behavior is a property of the trigger, not the windowing