Hi Mark,
Writing to the db in bulk would be the first step. Have you looked into
writing to the DB with a larger batch size. I believe mysql-beam-connector
also supports this.



On Wed, May 18, 2022 at 2:13 AM Mark Striebeck <[email protected]>
wrote:

> Hi,
>
> We have a datapipeline that produces ~400M datapoints each day. If we run
> it without storing, it finishes in a little over an hour. If we run it and
> store the datapoints in a MySQL database it takes several hours.
>
> We are running on GCP dataflow, the MySQL instances are hosted GCP
> instances. We are using mysql-beam-connector
> <https://github.com/esakik/beam-mysql-connector>.
>
> The pipeline writes ~5000 datapoints per second.
>
> A couple of questions:
>
>    - Does this throughput sound reasonable or could it be significantly
>    improved by optimizing the database?
>    - The pipeline runs several workers to write this out - and because
>    it's a write operation they content for write access. Is it better to write
>    out through just one worker and one connection?
>    - Is it actually faster to write from the pipeline to pubsub or kafka
>    or such and have a client on the other side which then writes in bulk?
>
> Thanks for any ideas or pointers (no, I'm by no means an
> experienced DBA!!!)
>
>      Mark
>

Reply via email to