[jira] [Commented] (KUDU-2381) Optimize DeltaMemStore for case of no matching deltas
[ https://issues.apache.org/jira/browse/KUDU-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865273#comment-16865273 ] ZhangYao commented on KUDU-2381: This week I will try to work on it.:) > Optimize DeltaMemStore for case of no matching deltas > - > > Key: KUDU-2381 > URL: https://issues.apache.org/jira/browse/KUDU-2381 > Project: Kudu > Issue Type: Improvement > Components: perf, tablet >Reporter: Todd Lipcon >Assignee: ZhangYao >Priority: Major > > Currently in a scan workload which scans 280 columns I see DeltaMemStore > iteration taking up a significant amount of CPU in the scan, despite the fact > that the dataset has no updates. Of 1.6sec in > MaterializingIterator::NextBlock, we spent 0.61s in DMSIterator::PrepareBatch > and 0.14s in DMSIterator::MayHaveDeltas. So, about 46% of our time here is on > wasted work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2381) Optimize DeltaMemStore for case of no matching deltas
[ https://issues.apache.org/jira/browse/KUDU-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861340#comment-16861340 ] Todd Lipcon commented on KUDU-2381: --- Looking again at a similar workload and still see behavior like this. An easy fix is likely to have a new bool like 'DeltaPreparer::_may_have_deltas' which can be set to true in all of the spots where deleted_, reinserted_, and updates_by_col_ are modified. We can use that to implement MayHaveDeltas easily and to short-circuit the clearing of updates_by_col_ for the common case of no deltas. > Optimize DeltaMemStore for case of no matching deltas > - > > Key: KUDU-2381 > URL: https://issues.apache.org/jira/browse/KUDU-2381 > Project: Kudu > Issue Type: Improvement > Components: perf, tablet >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > > Currently in a scan workload which scans 280 columns I see DeltaMemStore > iteration taking up a significant amount of CPU in the scan, despite the fact > that the dataset has no updates. Of 1.6sec in > MaterializingIterator::NextBlock, we spent 0.61s in DMSIterator::PrepareBatch > and 0.14s in DMSIterator::MayHaveDeltas. So, about 46% of our time here is on > wasted work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2381) Optimize DeltaMemStore for case of no matching deltas
[ https://issues.apache.org/jira/browse/KUDU-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415901#comment-16415901 ] Todd Lipcon commented on KUDU-2381: --- Looking at perf report it seems like the majority of CPU is in two spots: In MayHaveDeltas: {code} for (auto& col: updates_by_col_) { if (!col.empty()) { return true; } } {code} {code} for (UpdatesForColumn& ufc : updates_by_col_) { ufc.clear(); } {code} Both of these end up being no-ops in the case that there are no updates in the previously-scanned blocks. So, I think it would make sense to be more lazy about initializing updates_by_col_. > Optimize DeltaMemStore for case of no matching deltas > - > > Key: KUDU-2381 > URL: https://issues.apache.org/jira/browse/KUDU-2381 > Project: Kudu > Issue Type: Improvement > Components: perf, tablet >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > > Currently in a scan workload which scans 280 columns I see DeltaMemStore > iteration taking up a significant amount of CPU in the scan, despite the fact > that the dataset has no updates. Of 1.6sec in > MaterializingIterator::NextBlock, we spent 0.61s in DMSIterator::PrepareBatch > and 0.14s in DMSIterator::MayHaveDeltas. So, about 46% of our time here is on > wasted work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)