Re: Iterative Pipelines in Beam Model

Dan Halperin Wed, 22 Jun 2016 13:26:10 -0700

In addition to Frances' suggestion, you can also put a fixed loop *inside*
a pipeline by unrolling it, aka by running some iterative step N times.
This is a common approach in massively data parallel systems to e.g.
pagerank -- assume that it will converge after 100 iterations, and run your
pipeline 100 times.


While definitely not as flexible as cycles, this is very useful.

On Wed, Jun 22, 2016 at 1:10 PM, Frances Perry <f...@google.com> wrote:

> Correct -- Beam pipelines currently have to be DAGs currently and can't
> support cycles.
>
> There's a jira to look into iterations (
> https://issues.apache.org/jira/browse/BEAM-106). The semantics get a
> little tricky given the unified model, so there's still quite a lot to
> figure out.
>
> In the meantime, you can maintain the loop outside your pipeline --
> essentially creating a new pipeline for each iteration.
>
>
>
> On Wed, Jun 22, 2016 at 11:36 AM, Eswar Reddy <eswarareddyad...@gmail.com>
> wrote:
>
>> Hi Beam Users,
>>
>> For my future projects, I anticipate having to create pipelines having
>> cycle(s).Could see some engines(Flink
>> <https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#iterations>
>>  and Storm
>> <https://groups.google.com/forum/#!topic/storm-user/EjN1hU58Q_8>)
>> support this, they also provide flexibility to the user to ensure the loop
>> ends for a given message by splitting & filtering streams. Could someone
>> share any pointers to analogus of this in Beam Model? I did see that
>> documentation
>> <https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/Pipeline#class-pipeline>
>> says Pipeline is always a DAG though.
>>
>> Thanks,
>> Eswar.
>>
>
>

Re: Iterative Pipelines in Beam Model

Reply via email to