Todd Lipcon has submitted this change and it was merged. ( )

Change subject: row: optimize copying of MRS rows into the Arena

row: optimize copying of MRS rows into the Arena

I tried a stress workload using YCSB with 100 columns, each a 10-byte
string. I expected this to be roughly the same performance as 10 columns
containing 100-byte strings, but in fact it was about 3x as slow. A
profile showed most of the CPU consumed in MemRowSet::Insert,
specifically in the inlined Arena::AllocateBytes call. Apparently with
many threads trying to allocate each cell of each row separately from
the arena, this became a point of contention.

This patch batches the allocation to do a single allocation for all of
the strings to be copied.

I didn't do a full run to measure throughput, but roughly it seems about
20% faster and cluster-wide CPU usage is down about 50%. The
MemRowSet::Insert call went from about 50% of the cycles down to <2%.

Change-Id: I6eea882d1d9a7355fb0bbad12c388908ec399a39
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <>
M src/kudu/common/row.h
1 file changed, 23 insertions(+), 5 deletions(-)

  Kudu Jenkins: Verified
  David Ribeiro Alves: Looks good to me, approved

To view, visit
To unsubscribe, visit

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6eea882d1d9a7355fb0bbad12c388908ec399a39
Gerrit-Change-Number: 9404
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <>
Gerrit-Reviewer: David Ribeiro Alves <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <>

Reply via email to