Re: [PROPOSAL] Preparing for Beam 2.31.0 release

2021-06-23 Thread Andrew Pilloud
RC1 is mostly done, I'm just waiting on permissions to push the docker images. I filed a JIRA with INFRA ( https://issues.apache.org/jira/browse/INFRA-22019) if anyone could help please let me know. Andrew On Fri, Jun 18, 2021 at 1:33 PM Andrew Pilloud wrote: > We had a few more release

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
Hi Kenn, thanks for the pointers, that is really interesting reading that probably could (should) be part of the Beam docs. On the other hand Beam is no longer Dataflow only - and that could mean that some of the concepts can be reiterated, possibly? I don't quite understand where is the

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
> If that timer is triggered by the watermark of the previous step and that watermark is being held up by the entire iteration, then this timer will never fire and the whole transform could deadlock. This was one reason for multi-dimensional watermarks - the timer can fire based on the

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Kenneth Knowles
Most of the theory is particularly well-treated in "Timely Dataflow" and "Differential Dataflow". There is a brief summary of the latter at https://blog.acolyer.org/2015/06/17/differential-dataflow/ but I recommend actually reading both papers. It uses clock ticks rather than Beam's continuous

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Reuven Lax
On Wed, Jun 23, 2021 at 2:33 PM Jan Lukavský wrote: > On 6/23/21 11:13 PM, Reuven Lax wrote: > > > > On Wed, Jun 23, 2021 at 2:00 PM Jan Lukavský wrote: > >> The most qualitatively import use-case I see are ACID transactions - >> transactions naturally involve cycles, because the most natural

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
On 6/23/21 11:13 PM, Reuven Lax wrote: On Wed, Jun 23, 2021 at 2:00 PM Jan Lukavský > wrote: The most qualitatively import use-case I see are ACID transactions - transactions naturally involve cycles, because the most natural implementation would be of

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Reuven Lax
On Wed, Jun 23, 2021 at 2:00 PM Jan Lukavský wrote: > The most qualitatively import use-case I see are ACID transactions - > transactions naturally involve cycles, because the most natural > implementation would be of something like "optimistic locking" where the > transaction is allowed to

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
The most qualitatively import use-case I see are ACID transactions - transactions naturally involve cycles, because the most natural implementation would be of something like "optimistic locking" where the transaction is allowed to progress until a downstream "commit" sees a conflict, when it

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
BTW, the iterations might break the (otherwise very useful) concept that elements arriving ON_TIME should stay ON_TIME throughout the complete computation. If an element has an excessive amount of iterations to complete, it _could_ be output late even though it would have arrived ON_TIME. But

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Reuven Lax
One question I have is whether the use cases for cyclic graphs overlap substantially with the use cases for event-time watermarks. Many of the uses I'm aware of are ML-type algorithms (e.g. clustering) or iterative algorithms on large graphs (connected components, etc.), and it's unclear how many

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
Reuven, can you please elaborate a little on that? Why do you need watermark per iteration? Letting the watermark progress as soon as all the keys arriving before the upstream watermark terminate the cycle seems like a valid definition without the need to make the watermark multidimensional.

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
Right, one can "outsource" this functionality through external source, but that is a sort-of hackish solution. The most serious problem is that it "disconnects" the watermark of the feedback loop which can make it tricky to correctly compute the downstream watermark. The SDF approach seems to

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Luke Cwik
SDF isn't required as users already try to do things like this using UnboundedSource and Pubsub. On Wed, Jun 23, 2021 at 11:39 AM Reuven Lax wrote: > This was explored in the past, though the design started getting very > complex (watermarks of unbounded dimension, where each iteration has its

Flaky test issue report (29)

2021-06-23 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake) These are P1 issues because they have a major negative impact on the community and make it hard to

Re: [DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Reuven Lax
This was explored in the past, though the design started getting very complex (watermarks of unbounded dimension, where each iteration has its own watermark dimension). At the time, the exploration petered out. On Wed, Jun 23, 2021 at 10:13 AM Jan Lukavský wrote: > Hi, > > I'd like to discuss a

P1 issues report (42)

2021-06-23 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake). See

[DISCUSS] Do we have all the building block(s) to support iterations in Beam?

2021-06-23 Thread Jan Lukavský
Hi, I'd like to discuss a very rough idea. I didn't walk through all the corner cases and the whole idea has a lot of rough edges, so please bear with me. I was thinking about non-IO applications of splittable DoFn, and the main idea - and why it is called splittable - is that it can handle

Re: FileIO with custom sharding function

2021-06-23 Thread Jozef Vilcek
The difference in my opinion is in distinguishing between - as written in this thread - physical vs logical properties of the pipeline. I proposed to keep dynamic destination (logical) and sharding (physical) separate on API level as it is at implementation level. When I reason about using `by()`