Re: Tuning kudu write performance

Alexey Serbin Wed, 13 Nov 2019 16:30:59 -0800

Since I'm not sure I understood which flush mode is used at the client
side, I can suggest to make sure that your application is using
AUTO_FLUSH_BACKROUND flush mode (Java API link):


https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html#AUTO_FLUSH_BACKGROUND

Another point is the data the size of the write buffer at the client side.
If you are using Kudu Java client, with only 100Bytes in a row, a
KuduSession using AUTO_FLUSH_BACKGROUND mode buffers only 1000 rows, which
is about 100K per write batch.  Consider increasing the size of the buffer
using KuduSession.setMutationBufferSpace() method at least 10x times:

https://kudu.apache.org/apidocs/org/apache/kudu/client/KuduSession.html#setMutationBufferSpace-int-


Thanks,

Alexey

On Wed, Nov 13, 2019 at 4:12 PM Adar Lieber-Dembo <a...@cloudera.com> wrote:

> Oh whoops, I didn't scroll down and missed that. Thanks!
>
> Mauricio's suggestion is a good one. To that I would add: consider
> increasing the number of hash buckets.
>
> Additionally, what's the rest of the primary key look like? _key and
> event_time are in there, but in what order? UUIDs in particular are usually
> a poor choice for primary keys because of their random distribution, all
> but guaranteeing lots of compaction during ingest, which slows down
> throughput considerably. How bad it is depends on the arrangement of
> columns in the primary key, and how that order reflects (or does not
> reflect) the key order of incoming data.
>
> On Wed, Nov 13, 2019 at 4:00 PM Mauricio Aristizabal <mauri...@impact.com>
> wrote:
>
>> You should start by making sure each of your 3 hash partition tablets'
>> leaders is in each of your 3 nodes.  Very well could be all 3 were in the
>> same tablet server and you were ingesting into a single node.  If needed
>> use leader_step_down
>> <https://kudu.apache.org/docs/command_line_tools_reference.html#tablet-leader_step_down>
>>  to
>> move leaders around.
>>
>> FYI Adar, table schema was at bottom inside that iframe
>>
>> On Wed, Nov 13, 2019 at 3:24 PM Adar Lieber-Dembo <a...@cloudera.com>
>> wrote:
>>
>>> Some thoughts on how you might increase your write speed:
>>> - Don't use the same disk for both WAL and data directories. If you
>>> have enough disks, dedicate one for the WAL and the rest for data
>>> directories.
>>> - Since each disk is an SSD, experiment with a higher ratio of MM
>>> threads to data directories. We typically recommend 1:3, but that's
>>> for spinning disks. I see you've configured 2 MM threads for the
>>> masters but are still using just 1 for the tservers? Consider using
>>> 2-4.
>>> - How is your schema structured? Are you using hash partitioning?
>>> Range partitioning? Both? What's your primary key look like and does
>>> incoming data arrive in sorted order (or mostly sorted order) w.r.t.
>>> that key? Random order?
>>> https://kudu.apache.org/docs/schema_design.html is an excellent
>>> resource for understanding how schema can impact writes and reads.
>>>
>>> On Wed, Nov 13, 2019 at 3:07 PM wei ximing <wxmimpe...@outlook.com>
>>> wrote:
>>> >
>>> > Hi!
>>> >
>>> > I have some questions about kudu performance tuning.
>>> >
>>> > Kudu version： kudu 1.7.0-cdh5.16.2
>>> >
>>> > System memary pre node：256G
>>> >
>>> > 4 SSDs per machine：512G
>>> >
>>> > Three Master nodes and three Tserver nodes.
>>> >
>>> > // Master config
>>> > --fs_wal_dir=/mnt/disk1/kudu/var/wal
>>> >
>>> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
>>> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
>>> > --log_dir=/mnt/disk1/kudu/var/logs
>>> > --master_addresses=xxxx
>>> > --maintenance_manager_num_threads=2
>>> > --block_cache_capacity_mb=6144
>>> > --memory_limit_hard_bytes=34359738368
>>> > --max_log_size=40
>>> >
>>> > // Tserver config
>>> > --fs_wal_dir=/mnt/disk1/kudu/var/wal
>>> >
>>> --fs_data_dirs=/mnt/disk1/kudu/var/data,/mnt/disk2/kudu/var/data,/mnt/disk3/kudu/var/data,/mnt/disk4/kudu/var/data
>>> > --fs_metadata_dir=/mnt/disk1/kudu/var/metadata
>>> > --log_dir=/mnt/disk1/kudu/var/logs
>>> > --tserver_master_addrs=xxxx
>>> > --block_cache_capacity_mb=6144
>>> > --memory_limit_hard_bytes=34359738368
>>> > --max_log_size=40
>>> >
>>> > // Table schema
>>> > // _key is UUID for each msg
>>> > // event_time is data time
>>> > // Schema has only 15 columns
>>> > // Single message does not exceed 100Bytes
>>> >
>>> > HASH (_key) PARTITIONS 3,
>>> > RANGE (event_time) (
>>> >     PARTITION 2019-10-31T16:00:00.000000Z <= VALUES <
>>> 2019-11-30T16:00:00.000000Z
>>> > )
>>> >
>>> > I write a project to write data to kudu.
>>> >
>>> > Whether manual or automatic flush mode write speed is only 6MB/s.
>>> >
>>> > I think SSD should be more than this speed, and the network and memory
>>> have not reached the bottleneck.
>>> >
>>> > Is this the normal level of kudu writing? How to tuning?
>>> >
>>> >
>>> > Thanks.
>>>
>>
>>
>> --
>> Mauricio Aristizabal
>> Architect - Data Pipeline
>> mauri...@impact.com | 323 309 4260
>> https://impact.com
>> <https://www.linkedin.com/company/impact-martech/>
>> <https://www.facebook.com/ImpactParTech/>
>> <https://twitter.com/impactpartech>
>> <https://www.youtube.com/c/impactmartech>
>>
>

Re: Tuning kudu write performance

Reply via email to