Hi Prasanth, Great that you are using Kudu at CERN and happy overall. We all enjoyed reading the blog post that Zbigniew wrote about your experiences.
The temporary blow-up in space is an interesting effect that I wouldn't have expected. During an ongoing insert, I would expect that there is some space used for 'UNDO' deltas -- these are the records of the original insertion time of the row, and allow snapshot queries to properly avoid returning rows inserted past the selected snapshot timestamp. These records will be larger if you have a particularly large composite key -- especially in time series workloads they can be pretty big. They are only retained for 15 minutes, though, assuming that background tasks are allowed to run. In an ongoing workload, it's possible that flushes and compactions will be prioritized over removal of these UNDO deltas and thus grow over time, but it's hard to definitively say that this is the case that you're hitting. Another case that might cause these issues is if there are a lot of UPSERT or UPDATEs going into the table -- in that case we will retain past versions of the row, and those past versions are not stored in a columnar format. Hence they can take substantial amounts of disk space. Again, they are only retained for 15 minutes by default assuming there is some idle capacity to remove them, and if we aren't properly prioritizing their removal they may grow over time until the write workload becomes more idle. Given you are running Kudu 1.4, I'm guessing maybe you are compiling from source. In that case you might try cherry-picking c19b8f4a1a271af1efb5a01bdf05005d79bb85f6 (and probably will need its parent 96ad3b07cf1dc694ddcfd72405aeb662440199b5). These commits add a 'kudu local_replica data_size' command which can be used against a local tablet server to break down space usage by different consumers such as UNDO deltas, REDO deltas, base data, etc. If you're able to cherry-pick those I'd be interested to hear the results on your workload. Maybe we can tweak the maintenance process to prioritize data garbage collection more aggressively in certain circumstances. Thanks -Todd On Wed, Jun 28, 2017 at 10:16 PM, Prasanth Kothuri <[email protected] > wrote: > Hello There > > > > I am using kudu @ CERN with positive experience and thanks for the > performance improvements in 1.4! > > I have recently encountered an issue which I am unable to work around, it > is as follows > > > > I have a 18 node kudu cluster each with 32 cores, 128GB memory and 2 > disks. Using Spark API, I am inserting data into kudu table at the > sustained rate of 750k per second (which is awesome), after few days my > filesystems were becoming full ( 18 * 3TB = 54TB) even though the > on_disk_size reported in the metrics is around 4-5 TB. The filesystems come > back to the expected size after I stop the insertion for 6-8 hours, so I > suspect some post processing like rowset compactions were unable to keep up > with the insertion rate. I do have spare resources on the nodes, please can > you point me how I can troubleshoot this issue or any parameters changes > which can fasten these maintenance operations (I currently have > --maintenance_manager_num_threads=20). > > > > Any help / clues where to look is highly appreciated. > > > > Best Regards, > > Prasanth > > CERN IT > -- Todd Lipcon Software Engineer, Cloudera
