Yeah, I understand you'd like to save time for benchmarking. That's an expectation I can sympathise with. I just think the number of variables is much greater. Good Luck - with both getting the number and trusting that the answers cover your case :).
On Sat, Nov 20, 2021 at 10:08 PM Nicolas Paris <[email protected]> wrote: > > Thanks for the interessting resources. > > > > I believe it is impossible to get a straight answer from anyone, > > agreed real life testing is mandatory - like any informatic project > isn't ? That being said, I bet there is a finite category on how using > airflow, such : > > - high ressource consuming tasks (python computing operators...) > - low ressource consuming tasks (extenal sql queries, api call...) > - hybrid ressource consuming tasks > - long running dags > - short running dags > - many dags / tasks > - few dags / tasks > > For each of those category, general advice could be given, such : > > - kubernetes is great for few long running hybrid tasks > - celery is great for many short running tasks > > > In any cases, multiple schedulers are great for HA. I just wonder if > anyone here has seen improvements by using multiple scheds in the case > of : > > - high numbers of dags > - low ressource consuming tasks > > This would save me benchmarking this, and BTW, if I eventually do > benchmarks, I believe sharing the result will be useful for people.
