Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12692 )
Change subject: IMPALA-8284. KuduTableSink spends too much CPU in KuduSchema::Column() ...................................................................... IMPALA-8284. KuduTableSink spends too much CPU in KuduSchema::Column() The KuduSchema::Column() accessor actually returns a copy of the KuduColumnSchema object, which is not lightweight. We were inadvertently calling this function once for every null cell seen during an insertion. This caused a performance bottleneck for datasets with large numbers of NULL cells. This improves the situation by caching the nullability of the Kudu columns in our own vector. The vector lookups should be inlined and much faster than copying a KuduColumnSchema. No new tests included as this is a perf fix. Change-Id: I1b4d14d20252bdb190f50ebaaf6179a46eafb932 Reviewed-on: http://gerrit.cloudera.org:8080/12692 Reviewed-by: Will Berkeley <[email protected]> Reviewed-by: Thomas Marshall <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h 2 files changed, 16 insertions(+), 4 deletions(-) Approvals: Will Berkeley: Looks good to me, but someone else must approve Thomas Marshall: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/12692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1b4d14d20252bdb190f50ebaaf6179a46eafb932 Gerrit-Change-Number: 12692 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Thomas Marshall <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]>
