Hi!

I have a cloud dataflow job that is not scaling.

The job sequence is the following:
1 -  [io] Read from a file in the bucket (1 element out)
2 - [ParDo] With the file information, get a query from a database (10,000
elements out)
3 - [ParDo] Works with the elements

But when I read from a file that already contains the same database query
result it scales to 60+ workers:
1 -  [io] Read from a file in the bucket (10,000 elements out)
2 - [ParDo] Works with the elements

Do I have to develop an I/O connector for the apache beam to know how many
elements its dealing with?

Best regards
André Rocha Silva

Reply via email to