Since I'm not sure I understood which flush mode is used at the client side, I can suggest to make sure that your application is using AUTO_FLUSH_BACKROUND flush mode (Java API link):
https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html#AUTO_FLUSH_BACKGROUND Another point is the data the size of the write buffer at the client side. If you are using Kudu Java client, with only 100Bytes in a row, a KuduSession using AUTO_FLUSH_BACKGROUND mode buffers only 1000 rows, which is about 100K per write batch. Consider increasing the size of the buffer using KuduSession.setMutationBufferSpace() method at least 10x times: https://kudu.apache.org/apidocs/org/apache/kudu/client/KuduSession.html#setMutationBufferSpace-int- Thanks, Alexey On Wed, Nov 13, 2019 at 4:12 PM Adar Lieber-Dembo <a...@cloudera.com> wrote: > Oh whoops, I didn't scroll down and missed that. Thanks! > > Mauricio's suggestion is a good one. To that I would add: consider > increasing the number of hash buckets. > > Additionally, what's the rest of the primary key look like? _key and > event_time are in there, but in what order? UUIDs in particular are usually > a poor choice for primary keys because of their random distribution, all > but guaranteeing lots of compaction during ingest, which slows down > throughput considerably. How bad it is depends on the arrangement of > columns in the primary key, and how that order reflects (or does not > reflect) the key order of incoming data. > > On Wed, Nov 13, 2019 at 4:00 PM Mauricio Aristizabal <mauri...@impact.com> > wrote: > >> You should start by making sure each of your 3 hash partition tablets' >> leaders is in each of your 3 nodes. Very well could be all 3 were in the >> same tablet server and you were ingesting into a single node. If needed >> use leader_step_down >> <https://kudu.apache.org/docs/command_line_tools_reference.html#tablet-leader_step_down> >> to >> move leaders around. >> >> FYI Adar, table schema was at bottom inside that iframe >> >> On Wed, Nov 13, 2019 at 3:24 PM Adar Lieber-Dembo <a...@cloudera.com> >> wrote: >> >>> Some thoughts on how you might increase your write speed: >>> - Don't use the same disk for both WAL and data directories. If you >>> have enough disks, dedicate one for the WAL and the rest for data >>> directories. >>> - Since each disk is an SSD, experiment with a higher ratio of MM >>> threads to data directories. We typically recommend 1:3, but that's >>> for spinning disks. I see you've configured 2 MM threads for the >>> masters but are still using just 1 for the tservers? Consider using >>> 2-4. >>> - How is your schema structured? Are you using hash partitioning? >>> Range partitioning? Both? What's your primary key look like and does >>> incoming data arrive in sorted order (or mostly sorted order) w.r.t. >>> that key? Random order? >>> https://kudu.apache.org/docs/schema_design.html is an excellent >>> resource for understanding how schema can impact writes and reads. >>> >>> On Wed, Nov 13, 2019 at 3:07 PM wei ximing <wxmimpe...@outlook.com> >>> wrote: >>> > >>> > Hi! >>> > >>> > I have some questions about kudu performance tuning. >>> > >>> > Kudu version: kudu 1.7.0-cdh5.16.2 >>> > >>> > System memary pre node:256G >>> > >>> > 4 SSDs per machine:512G >>> > >>> > Three Master nodes and three Tserver nodes. >>> > >>> > // Master config >>> > --fs_wal_dir=/mnt/disk1/kudu/var/wal >>> > >>> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data >>> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata >>> > --log_dir=/mnt/disk1/kudu/var/logs >>> > --master_addresses=xxxx >>> > --maintenance_manager_num_threads=2 >>> > --block_cache_capacity_mb=6144 >>> > --memory_limit_hard_bytes=34359738368 >>> > --max_log_size=40 >>> > >>> > // Tserver config >>> > --fs_wal_dir=/mnt/disk1/kudu/var/wal >>> > >>> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data >>> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata >>> > --log_dir=/mnt/disk1/kudu/var/logs >>> > --tserver_master_addrs=xxxx >>> > --block_cache_capacity_mb=6144 >>> > --memory_limit_hard_bytes=34359738368 >>> > --max_log_size=40 >>> > >>> > // Table schema >>> > // _key is UUID for each msg >>> > // event_time is data time >>> > // Schema has only 15 columns >>> > // Single message does not exceed 100Bytes >>> > >>> > HASH (_key) PARTITIONS 3, >>> > RANGE (event_time) ( >>> > PARTITION 2019-10-31T16:00:00.000000Z <= VALUES < >>> 2019-11-30T16:00:00.000000Z >>> > ) >>> > >>> > I write a project to write data to kudu. >>> > >>> > Whether manual or automatic flush mode write speed is only 6MB/s. >>> > >>> > I think SSD should be more than this speed, and the network and memory >>> have not reached the bottleneck. >>> > >>> > Is this the normal level of kudu writing? How to tuning? >>> > >>> > >>> > Thanks. >>> >> >> >> -- >> Mauricio Aristizabal >> Architect - Data Pipeline >> mauri...@impact.com | 323 309 4260 >> https://impact.com >> <https://www.linkedin.com/company/impact-martech/> >> <https://www.facebook.com/ImpactParTech/> >> <https://twitter.com/impactpartech> >> <https://www.youtube.com/c/impactmartech> >> >