Hi Lukasz,
thanks for the proposed solution. This was also one of the alternative
implementations that I thought of.
When you are talking about launching a job from another job, I understand
doing a system call from another python job and getting result by some
means (reading synchronously the output of child jobs), am I correct?
I'll first test this with the DirectRunner calling other DirectRunner(s),
and afterwards doing it on GCP with DataFlow.
Regarding nesting pipeline, I can provide support to build a demonstrator
if I can have some support from the community.
Thanks again and very best regards,
Pascal


On Thu, Aug 16, 2018 at 8:43 PM, Lukasz Cwik <[email protected]> wrote:

> You can launch another Dataflow job from within an existing Dataflow job.
> For all intensive purposes, Dataflow won't know that the jobs are related
> in any way so they will only be "nested" because your outer pipeline knows
> about the inner pipeline.
>
> You should be able to do this for all runners (granted you need to
> propagate all runner/pipeline configuration through) and you should be able
> to take a job from one runner and launch a job on a different runner
> (you'll have to deal with the complexities of having two runners and their
> dependencies somehow though).
>
> There was some work investigating supporting nested graphs within Apache
> Beam and to support dynamic graph expansion during execution as a general
> concept. This was to support use cases such as recursion and loops but this
> didn't progress much more then the idea generation phase.
>
> On Thu, Aug 16, 2018 at 9:47 AM Pascal Gula <[email protected]> wrote:
>
>> Hi Robin,
>> this is unfortunate news, but I already anticipated such answer with an
>> alternative implementation.
>> It would be however interesting to support such feature since I am
>> probably not the first person asking for this.
>> Best regards,
>> Pascal
>>
>> On Thu, Aug 16, 2018 at 6:20 PM, Robin Qiu <[email protected]> wrote:
>>
>>> Hi Pascal,
>>>
>>> As far as I know, you can't create sub-pipeline within a DoFn, i.e.
>>> nested pipelines are not supported.
>>>
>>> Best,
>>> Robin
>>>
>>> On Thu, Aug 16, 2018 at 7:03 AM Pascal Gula <[email protected]> wrote:
>>>
>>>> As a bonus, here is a simplified diagram view of the use-case:
>>>>
>>>> Cheers,
>>>> Pascal
>>>>
>>>>
>>>> On Thu, Aug 16, 2018 at 3:12 PM, Pascal Gula <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> I am currently evaluating Apache Beam (later executing on Google
>>>>> DataFlow), and for the first use-case I am working on, I have a kinda
>>>>> design question to see if any of you already had a similar one.
>>>>> Namely, we have a DB describing dashboards views, and for each views,
>>>>> we would like to perform some aggregation transform.
>>>>> My first approach would be to create a higher level pipeline that will
>>>>> fetch all view configurations from our mongoDB (BTW, we released a mongoDB
>>>>> IO connector here: https://pypi.org/project/beam-extended/). With
>>>>> this views PColl, the idea is to have a ParDo, with a DoFn that will 
>>>>> create
>>>>> sub-pipleine to perform the aggregation on data from our plant database
>>>>> with a qurey derived from the view configuration. Afterwards, the idea is
>>>>> to save for the higher level pipeline, some performance/data metrics
>>>>> related to the execution of the array of sub-pipeline.
>>>>> The main question is: are nested pipeline supported by the runner?
>>>>> I hope that my description was clear enough. I will work on a diagram
>>>>> view meanwhile.
>>>>> Very best regards,
>>>>> Pascal
>>>>>
>>>>> --
>>>>>
>>>>> Pascal Gula
>>>>> Senior Data Engineer / Scientist
>>>>> +49 (0)176 34232684www.plantix.net <http://plantix.net/>
>>>>>  PEAT GmbH
>>>>> Kastanienallee 4
>>>>> 10435 Berlin // Germany
>>>>>  
>>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download
>>>>>  the App! 
>>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Pascal Gula
>>>> Senior Data Engineer / Scientist
>>>> +49 (0)176 34232684www.plantix.net <http://plantix.net/>
>>>>  PEAT GmbH
>>>> Kastanienallee 4
>>>> 10435 Berlin // Germany
>>>>  
>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download
>>>>  the App! 
>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>
>>>>
>>>>
>>
>>
>> --
>>
>> Pascal Gula
>> Senior Data Engineer / Scientist
>> +49 (0)176 34232684www.plantix.net <http://plantix.net/>
>>  PEAT GmbH
>> Kastanienallee 4
>> 10435 Berlin // Germany
>>  <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download 
>> the App! <https://play.google.com/store/apps/details?id=com.peat.GartenBank>
>>
>>


-- 

Pascal Gula
Senior Data Engineer / Scientist
+49 (0)176 34232684www.plantix.net <http://plantix.net/>
 PEAT GmbH
Kastanienallee 4
10435 Berlin // Germany
 <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download
the App! <https://play.google.com/store/apps/details?id=com.peat.GartenBank>

Reply via email to