Hello, I am new to the Task Framework and need help understanding a few concepts. What is the best practice for jobs with dependencies, while the number of tasks also depend on the parent job?
For example, the job_1 is to list all databases, and job_2 is to list all tables for all databases found from the result of job_1. The workflow examples I found either define the tasks statically, or starting a fixed number of tasks for a job. If I understand correctly, since I don't know exactly how many tasks I need in job_2, I should do my best guess and use a larger number as the number of partitions. For example, when I start the workflow, I can configure the job_2 to run 10 tasks, no matter how many databases exists. If there are 100 databases exists as the result of job_1, Helix Task Framework will somehow assign 5 databases to each task. Is this correct? Thanks, Yi
