Hello Thomas Marshall,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12692

to review the following change.


Change subject: IMPALA-8284. KuduTableSink spends too much CPU in 
KuduSchema::Column()
......................................................................

IMPALA-8284. KuduTableSink spends too much CPU in KuduSchema::Column()

The KuduSchema::Column() accessor actually returns a copy of the
KuduColumnSchema object, which is not lightweight. We were inadvertently
calling this function once for every null cell seen during an insertion.
This caused a performance bottleneck for datasets with large numbers of
NULL cells.

This improves the situation by caching the nullability of the Kudu
columns in our own vector. The vector lookups should be inlined and much
faster than copying a KuduColumnSchema.

No new tests included as this is a perf fix.

Change-Id: I1b4d14d20252bdb190f50ebaaf6179a46eafb932
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
2 files changed, 16 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12692/1
--
To view, visit http://gerrit.cloudera.org:8080/12692
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1b4d14d20252bdb190f50ebaaf6179a46eafb932
Gerrit-Change-Number: 12692
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>

Reply via email to