To let Dataflow choose the optimal number shards and maximize
performance, it's often significantly better to simply leave it
unspecified. A higher numShards only helps if you have at least that
many workers.

On Thu, Feb 13, 2020 at 10:24 PM vivek chaurasiya <vivek....@gmail.com> wrote:
>
> hi folks, I have this in code
>
>             globalIndexJson.apply("GCSOutput", 
> TextIO.write().to(fullGCSPath).withSuffix(".txt").withNumShards(500));
>
> the same code is executed for 50GB, 3TB, 5TB of data. I want to know if 
> changing numShards for larger datasize will write to GCS faster?

Reply via email to