Hi, We are on version 1.23.2 and have some questions surrounding the ETL data pipeline.
It connects to the Oracle DB to extract/pull data incrementally (minutes/hours), does transformations and loads to S3. However, we are seeing a bottleneck at Oracle/pulling the data, so when we assign more threads to that it creates a bottleneck at the transformation stage since it's monopolizing the threads. Is there a way to dynamically assign the threads? The data in Oracle is not uniformly distributed so some days/hours have much more data. For those days/hours, having access to more threads helps tremendously extracting the data. Any advice/recommendation on how to approach performance tuning in this scenario? Should we just divy up the available threads to the extract and transformation processors evenly? Not sure what would be the best way to optimally assign the number of threads for each processor to maximize throughput of the pipeline. Best, Eric
