Re: master on Dataflow with schema aware PCollections stuck

2020-01-12 Thread Reuven Lax
Can you elucidate? All BeamSQL pipelines use schemas and I believe those test are working just fine on the Dataflow runner. In addition, there are a number of ValidatesRunner schema-aware pipelines that are running regularly on the Dataflow runner. On Sun, Jan 12, 2020 at 1:43 AM Alex Van Boxel

Re: outputWithTimestamp

2020-01-12 Thread Reuven Lax
Semantically though, since you want the CalendarWindow aggregation to be based on login timestamps, the watermark should be tracking the login timestamps. The watermark is a way for the CalendarWindow to know that as far as the system knows, there will be no more events that fall into that window.

Re: [DISCUSS] Python static type checkers

2020-01-12 Thread Chad Dombrova
Hi folks, I agree with Robert that we need to wait and see before making any decisions, but I do have some opinions about the probable/desired outcome. I haven't used pytype, but my experience working with mypy over the past few years -- and following various issues and peps related to it and

Re: Jenkins jobs not running for my PR 10438

2020-01-12 Thread Tomo Suzuki
Hi Beam committers, Four Jenkins jobs did not report back for this PR https://github.com/apache/beam/pull/10554 . Can somebody trigger them? On Fri, Jan 10, 2020 at 4:51 PM Andrew Pilloud wrote: > > Done. > > On Fri, Jan 10, 2020 at 12:59 PM Tomo Suzuki wrote: >> >> Hi Bean developers, >> >> I

Re: outputWithTimestamp

2020-01-12 Thread Aaron Dixon
Reuven thanks -- I understand each point although I'm trying to grapple with your concerns expressed in #3; they don't seem avoidable even w/o the allowedSkew feature. Considering your response I see a revision to my solution that omits using the allowed skew configuration but as far as I can

Re: outputWithTimestamp

2020-01-12 Thread Reuven Lax
A few comments: 1. Yes, this already works on Dataflow (at Beam head). Flink support is pending at pr/10534. 2. Just to make sure where on the same page: getAllowedTimestampSkew is _not_ about outputting behind the watermark. Rather it's about outputting a timestamp that's less than the current

Re: Executing the runner validation tests for the Twister2 runner

2020-01-12 Thread Pulasthi Supun Wickramasinghe
Hi Kenn, Is there any documentation that needs to accompany the new runner in the pull request or is the documentation added after the pull request is approved?. I would be great if you can point me in the right direction regarding this. Best Regards, Pulasthi On Mon, Jan 6, 2020 at 9:56 PM

Re: outputWithTimestamp

2020-01-12 Thread Aaron Dixon
Reuven thanks for your insights so far. Just wanted to press a little more on the deprecation question as I'm still (so far) convinced that my use case is quite a straightforward justification (I'm looking for confirmation or correction to my thinking here.) I've simplified my use case a bit if it

Beam Summit North America 2019 - recordings

2020-01-12 Thread Matthias Baetens
Hi everyone, It's our pleasure to share the recordings from the Beam Summit North America 2019. Please find them in the YouTube playlist on the Apache Beam channel

Re: master on Dataflow with schema aware PCollections stuck

2020-01-12 Thread Alex Van Boxel
BTW. This is not a support ticket, I'm wondering if we are aware and we're missing schema aware integration tests as well. _/ _/ Alex Van Boxel On Sun, Jan 12, 2020 at 10:43 AM Alex Van Boxel wrote: > Hey all, > > anyone tried master with a *schema aware pipeline* on Dataflow? I'm > testing

master on Dataflow with schema aware PCollections stuck

2020-01-12 Thread Alex Van Boxel
Hey all, anyone tried master with a *schema aware pipeline* on Dataflow? I'm testing some PR's to see if the run on Dataflow (as they are working on Direct) but they got: Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. You