Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-4134,IMPALA-3704: Kudu INSERT improvements

IMPALA-4134,IMPALA-3704: Kudu INSERT improvements

1.) IMPALA-4134: Use Kudu AUTO FLUSH
Improves performance of writes to Kudu up to 4.2x in
bulk data loading tests (load 200 million rows from

2.) IMPALA-3704: Improve errors on PK conflicts
The Kudu client reports an error for every PK conflict,
and all errors were being returned in the error status.
As a result, inserts/updates/deletes could return errors
with thousands errors reported. This changes the error
handling to return only the first error in the query error
status, and to log the rest at a lower log level.

3.) Improve the DataSink reporting of the insert stats.
The per-partition stats returned by the data sink weren't
useful for Kudu sinks. Firstly, the number of appended rows
was not being displayed in the profile. Secondly, the
'stats' field isn't populated for Kudu tables and thus was
confusing in the profile, so it is no longer printed if it
is not set in the thrift struct.

Testing: Ran local tests, including new tests to verify
the query profile insert stats. Manual cluster testing was
conducted of the AUTO FLUSH functionality, and that testing
informed the default mutation buffer value of 100MB which
was found to provide good results.

Change-Id: I5542b9a061b01c543a139e8722560b1365f06595
M be/src/exec/
M be/src/exec/
M be/src/exec/kudu-table-sink.h
M be/src/runtime/
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
5 files changed, 141 insertions(+), 57 deletions(-)

  git pull ssh:// refs/changes/28/4728/2
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5542b9a061b01c543a139e8722560b1365f06595
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Todd Lipcon <>

Reply via email to