Hi,

We are on version 1.23.2 and have some questions surrounding the ETL data
pipeline.

It connects to the Oracle DB to extract/pull data incrementally
(minutes/hours), does transformations and loads to S3.

However, we are seeing a bottleneck at Oracle/pulling the data, so when we
assign more threads to that it creates a bottleneck at the transformation
stage since it's monopolizing the threads.

Is there a way to dynamically assign the threads?  The data in Oracle is
not uniformly distributed so some days/hours have much more data.  For
those days/hours, having access to more threads helps tremendously
extracting the data.

Any advice/recommendation on how to approach performance tuning in this
scenario?  Should we just divy up the available threads to the extract and
transformation processors evenly?  Not sure what would be the best way to
optimally assign the number of threads for each processor to maximize
throughput of the pipeline.


Best,
Eric

Reply via email to