Hello,

I am new to the Task Framework and need help understanding a few concepts.
What is the best practice for jobs with dependencies, while the number of
tasks also depend on the parent job?

For example, the job_1 is to list all databases, and job_2 is to list all
tables for all databases found from the result of job_1. The workflow
examples I found either define the tasks statically, or starting a fixed
number of tasks for a job.

If I understand correctly, since I don't know exactly how many tasks I need
in job_2, I should do my best guess and use a larger number as the number
of partitions. For example, when I start the workflow, I can configure the
job_2 to run 10 tasks, no matter how many databases exists. If there are
100 databases exists as the result of job_1, Helix Task Framework will
somehow assign 5 databases to each task. Is this correct?

Thanks,
Yi

Reply via email to