Also, keep in mind that when the MRS flushes, it flushes into a bunch of separate RowSets, not 1:1. It "rolls" to a new RowSet every N MB (N=32 by default). This is set by --budgeted_compaction_target_rowset_size
However, increasing this size isn't likely to decrease the number of compactions, because each of these 32MB rowsets is non-overlapping. In other words, if your MRS contains rows A-Z, the output RowSets will include [A-C], [D-G], [H-P], [Q-Z]. Since these ranges do not overlap, they will never need to be compacted with each other. The net result, here, is that compaction becomes more fine-grained and only needs to operate on sub-ranges of the tablet where there is a lot of overlap. You can read more about this in docs/design-docs/compaction-policy.md, in particular the section "Limiting RowSet Sizes" Hope that helps -Todd On Fri, Jun 15, 2018 at 8:26 AM, William Berkeley <wdberke...@gmail.com> wrote: > The op seen in the logs is a rowset compaction, which takes existing > diskrowsets and rewrites them. It's not a flush, which writes data in > memory to disk, so I don't think the flush_threshold_mb is relevant. Rowset > compaction is done to reduce the amount of overlap of rowsets in primary > key space, i.e. reduce the number of rowsets that might need to be checked > to enforce the primary key constraint or find a row. Having lots of rowset > compaction indicates that rows are being written in a somewhat random order > w.r.t the primary key order. Kudu will perform much better as writes scale > when rows are inserted roughly in increasing order per tablet. > > Also, because you are using the log block manager (the default and only > one suitable for production deployments), there isn't a 1-1 relationship > between cfiles or diskrowsets and files on the filesystem. Many cfiles and > diskrowsets will be put together in a container file. > > Config parameters that might be relevant here: > --maintenance_manager_num_threads > --fs_data_dirs (how many) > --fs_wal_dir (is it shared on a device with the data dir?) > > The metrics from the compact row sets op indicates the time is spent in > fdatasync and in reading (likely reading the original rowsets). The overall > compaction time is kinda long but not crazy long. What's the performance > you are seeing and what is the performance you would like to see? > > -Will > > On Fri, Jun 15, 2018 at 7:52 AM, Quanlong Huang <huang_quanl...@126.com> > wrote: > >> Hi all, >> >> I'm running kudu 1.6.0-cdh5.14.2. When looking into the logs of tablet >> server, I find most of the compactions are compacting small files (~40MB >> for each). For example: >> >> I0615 07:22:42.637351 30614 tablet.cc:1661] T >> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7: >> Compaction: stage 1 complete, picked 4 rowsets to compact >> I0615 07:22:42.637385 30614 compaction.cc:903] Selected 4 rowsets to >> compact: >> I0615 07:22:42.637393 30614 compaction.cc:906] RowSet(343)(current size >> on disk: ~40666600 bytes) >> I0615 07:22:42.637401 30614 compaction.cc:906] RowSet(1563)(current size >> on disk: ~34720852 bytes) >> I0615 07:22:42.637408 30614 compaction.cc:906] RowSet(1645)(current size >> on disk: ~29914833 bytes) >> I0615 07:22:42.637415 30614 compaction.cc:906] RowSet(1870)(current size >> on disk: ~29007249 bytes) >> I0615 07:22:42.637428 30614 tablet.cc:1447] T >> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7: >> Compaction: entering phase 1 (flushing snapshot). Phase 1 snapshot: >> MvccSnapshot[committed={T|T < 6263071556616208384 or (T in >> {6263071556616208384})}] >> I0615 07:22:42.641582 30614 multi_column_writer.cc:103] Opened CFile >> writers for 124 column(s) >> I0615 07:22:43.875396 30614 multi_column_writer.cc:103] Opened CFile >> writers for 124 column(s) >> I0615 07:22:44.418421 30614 multi_column_writer.cc:103] Opened CFile >> writers for 124 column(s) >> I0615 07:22:45.114389 30614 multi_column_writer.cc:103] Opened CFile >> writers for 124 column(s) >> I0615 07:22:54.762563 30614 tablet.cc:1532] T >> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7: >> Compaction: entering phase 2 (starting to duplicate updates in new rowsets) >> I0615 07:22:54.773572 30614 tablet.cc:1587] T >> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7: >> Compaction Phase 2: carrying over any updates which arrived during Phase 1 >> I0615 07:22:54.773599 30614 tablet.cc:1589] T >> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7: >> Phase 2 snapshot: MvccSnapshot[committed={T|T < 6263071556616208384 or (T >> in {6263071556616208384})}] >> I0615 07:22:55.189757 30614 tablet.cc:1631] T >> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7: >> Compaction successful on 82987 rows (123387929 bytes) >> I0615 07:22:55.191426 30614 maintenance_manager.cc:491] Time spent >> running CompactRowSetsOp(6bdefb8c27764a0597dcf98ee1b450ba): real 12.628s user >> 1.460s sys 0.410s >> I0615 07:22:55.191484 30614 maintenance_manager.cc:497] P >> 70f3e54fe0f3490cbf0371a6830a33a7: >> CompactRowSetsOp(6bdefb8c27764a0597dcf98ee1b450ba) >> metrics: {"cfile_cache_hit":812,"cfile_cache_hit_bytes":16840376,"cfi >> le_cache_miss":2730,"cfile_cache_miss_bytes":251298442,"cfile_init":496,"data >> dirs.queue_time_us":6646,"data dirs.run_cpu_time_us":2188,"data >> dirs.run_wall_time_us":101717,"fdatasync":315,"fdatasync_us" >> :9617174,"lbm_read_time_us":1288971,"lbm_reads_1-10_ms >> <https://maps.google.com/?q=1-10_ms+:+32&entry=gmail&source=g>":32," >> lbm_reads_10-100_ms":41,"lbm_reads_lt_1ms":4641,"lbm_write_ >> time_us":122520,"lbm_writes_lt_1ms":2799,"mutex_wait_us": >> 25,"spinlock_wait_cycles":155264,"tcmalloc_contention_ >> cycles":768,"thread_start_us":677,"threads_started":14,"wal- >> append.queue_time_us":300} >> >> The flush_threshold_mb is set in the default value (1024). Wouldn't the >> flushed file size be ~1GB? >> >> I think increasing the initial RowSet size can reduce compactions and >> then reduce the impact of other ongoing operations. It may also improve the >> flush performance. Is that right? If so, how can I increase the RowSet size? >> >> I'd be grateful if someone can make me clear about these! >> >> Thanks, >> Quanlong >> > > -- Todd Lipcon Software Engineer, Cloudera