You can launch another Dataflow job from within an existing Dataflow job.
For all intensive purposes, Dataflow won't know that the jobs are related
in any way so they will only be "nested" because your outer pipeline knows
about the inner pipeline.

You should be able to do this for all runners (granted you need to
propagate all runner/pipeline configuration through) and you should be able
to take a job from one runner and launch a job on a different runner
(you'll have to deal with the complexities of having two runners and their
dependencies somehow though).

There was some work investigating supporting nested graphs within Apache
Beam and to support dynamic graph expansion during execution as a general
concept. This was to support use cases such as recursion and loops but this
didn't progress much more then the idea generation phase.

On Thu, Aug 16, 2018 at 9:47 AM Pascal Gula <[email protected]> wrote:

> Hi Robin,
> this is unfortunate news, but I already anticipated such answer with an
> alternative implementation.
> It would be however interesting to support such feature since I am
> probably not the first person asking for this.
> Best regards,
> Pascal
>
> On Thu, Aug 16, 2018 at 6:20 PM, Robin Qiu <[email protected]> wrote:
>
>> Hi Pascal,
>>
>> As far as I know, you can't create sub-pipeline within a DoFn, i.e.
>> nested pipelines are not supported.
>>
>> Best,
>> Robin
>>
>> On Thu, Aug 16, 2018 at 7:03 AM Pascal Gula <[email protected]> wrote:
>>
>>> As a bonus, here is a simplified diagram view of the use-case:
>>>
>>> Cheers,
>>> Pascal
>>>
>>>
>>> On Thu, Aug 16, 2018 at 3:12 PM, Pascal Gula <[email protected]> wrote:
>>>
>>>> Hello,
>>>> I am currently evaluating Apache Beam (later executing on Google
>>>> DataFlow), and for the first use-case I am working on, I have a kinda
>>>> design question to see if any of you already had a similar one.
>>>> Namely, we have a DB describing dashboards views, and for each views,
>>>> we would like to perform some aggregation transform.
>>>> My first approach would be to create a higher level pipeline that will
>>>> fetch all view configurations from our mongoDB (BTW, we released a mongoDB
>>>> IO connector here: https://pypi.org/project/beam-extended/). With this
>>>> views PColl, the idea is to have a ParDo, with a DoFn that will create
>>>> sub-pipleine to perform the aggregation on data from our plant database
>>>> with a qurey derived from the view configuration. Afterwards, the idea is
>>>> to save for the higher level pipeline, some performance/data metrics
>>>> related to the execution of the array of sub-pipeline.
>>>> The main question is: are nested pipeline supported by the runner?
>>>> I hope that my description was clear enough. I will work on a diagram
>>>> view meanwhile.
>>>> Very best regards,
>>>> Pascal
>>>>
>>>> --
>>>>
>>>> Pascal Gula
>>>> Senior Data Engineer / Scientist
>>>> +49 (0)176 34232684www.plantix.net <http://plantix.net/>
>>>>  PEAT GmbH
>>>> Kastanienallee 4
>>>> 10435 Berlin // Germany
>>>>  
>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download
>>>>  the App! 
>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Pascal Gula
>>> Senior Data Engineer / Scientist
>>> +49 (0)176 34232684www.plantix.net <http://plantix.net/>
>>>  PEAT GmbH
>>> Kastanienallee 4
>>> 10435 Berlin // Germany
>>>  
>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download 
>>> the App! <https://play.google.com/store/apps/details?id=com.peat.GartenBank>
>>>
>>>
>
>
> --
>
> Pascal Gula
> Senior Data Engineer / Scientist
> +49 (0)176 34232684www.plantix.net <http://plantix.net/>
>  PEAT GmbH
> Kastanienallee 4
> 10435 Berlin // Germany
>  <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download 
> the App! <https://play.google.com/store/apps/details?id=com.peat.GartenBank>
>
>

Reply via email to