Hi, As per the Authoring I/O Transforms guide <https://beam.apache.org/documentation/io/authoring-overview/>, the recommended way to implement a Read transform (from a source that can be read in parallel) has these steps: - Splitting the data into parts to be read in parallel (ParDo) - Reading from each of those parts (ParDo) - With a GroupByKey in between the ParDo:s The stated motivation for the GroupByKey is "it allows the runner to use different numbers of workers" for the splitting and reading parts. Can someone elaborate (or point to some relevant DOCs) on how the GroupByKey will enable using different number of works for the two ParDo steps.
Thanks, Mohamed
