Hi Team, This is regarding a performance issue me and my team have on a huge data load in Kudu. We are hoping you can guide us on a solution to the below mentioned concerns.
We have 212 million data loads in Kudu. Currently for such a data load, when loading through impala, 47 seconds are spent for query processing and loading overall. We have used default configurations in Kudu and Impala with 6 node clusters to get these numbers. We have achieved the performance as we expected in Kudu level, however, in Impala we haven’t reached to the performance we expected. What can we do to reduce the time spent for loading 212 million data loads from 47 seconds to 10 seconds through impala? We would be much obliged if you can provide us with some solutions. Thank You! Best Regards Sumudu Madushanka