Hello There I am using kudu @ CERN with positive experience and thanks for the performance improvements in 1.4! I have recently encountered an issue which I am unable to work around, it is as follows
I have a 18 node kudu cluster each with 32 cores, 128GB memory and 2 disks. Using Spark API, I am inserting data into kudu table at the sustained rate of 750k per second (which is awesome), after few days my filesystems were becoming full ( 18 * 3TB = 54TB) even though the on_disk_size reported in the metrics is around 4-5 TB. The filesystems come back to the expected size after I stop the insertion for 6-8 hours, so I suspect some post processing like rowset compactions were unable to keep up with the insertion rate. I do have spare resources on the nodes, please can you point me how I can troubleshoot this issue or any parameters changes which can fasten these maintenance operations (I currently have --maintenance_manager_num_threads=20). Any help / clues where to look is highly appreciated. Best Regards, Prasanth CERN IT
