I am not sure how you want to multiply the tasks - I imagine you want to have something like task1_1, task1_2, task1_3 etc. Or maybe you think about back-filling and running same tasks for different runs?
In both cases you can use pools: https://airflow.apache.org/docs/stable/concepts.html#pools to limit how many tasks can be run in parallel for given pool. There is another way to limit parallelism of tasks - applicable for example in cases where you have different kinds of machines with different capabilities (for example with/without GPU). You can have some affinities defined between tasks and actual machines that are executing them - in Celery executor you can define queues in your celery configuration and you can assign your task to one of the queues. Then you can have a number of workers/slots defined for all machines in the queue and as effect you can also limit parallelism of the tasks this way: https://airflow.apache.org/docs/stable/concepts.html#queues J. On Thu, Dec 19, 2019 at 1:29 AM Reed Villanueva <[email protected]> wrote: > Is there a way to control the parallelism for particular tasks in an > airflow dag? Eg. say I have a dag definition like... > > for dataset in list_of_datasets: > # some simple operation > task_1 = BashOperator(task_id=f'task_1_{dataset.name}', ...) > # load intensive operation > task_2 = BashOperator() > # another simple operation > task_3 = BashOperator() > > task_1 >> task_2 >> task_3 > > Is there a way to have something where task_1 can have, say, 5 of its kind > running in a dag instance, while only 2 instances of task_2 may be running > in a dag instance (also implying that if there are 2 instances of task_2 > already running, then only 3 instances of task_1 can run)? Any other common > ways to work around this kind of requirement (I imagine this must come up > often for pipelines)? > > This electronic message is intended only for the named > recipient, and may contain information that is confidential or > privileged. If you are not the intended recipient, you are > hereby notified that any disclosure, copying, distribution or > use of the contents of this message is strictly prohibited. If > you have received this message in error or are not the named > recipient, please notify us immediately by contacting the > sender at the electronic mail address noted above, and delete > and destroy all copies of this message. Thank you. > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
