[jira] [Commented] (KUDU-2381) Optimize DeltaMemStore for case of no matching deltas

2019-06-16 Thread ZhangYao (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865273#comment-16865273
 ] 

ZhangYao commented on KUDU-2381:


This week I will try to work on it.:)

> Optimize DeltaMemStore for case of no matching deltas
> -
>
> Key: KUDU-2381
> URL: https://issues.apache.org/jira/browse/KUDU-2381
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, tablet
>Reporter: Todd Lipcon
>Assignee: ZhangYao
>Priority: Major
>
> Currently in a scan workload which scans 280 columns I see DeltaMemStore 
> iteration taking up a significant amount of CPU in the scan, despite the fact 
> that the dataset has no updates. Of 1.6sec in 
> MaterializingIterator::NextBlock, we spent 0.61s in DMSIterator::PrepareBatch 
> and 0.14s in DMSIterator::MayHaveDeltas. So, about 46% of our time here is on 
> wasted work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2381) Optimize DeltaMemStore for case of no matching deltas

2019-06-11 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861340#comment-16861340
 ] 

Todd Lipcon commented on KUDU-2381:
---

Looking again at a similar workload and still see behavior like this. An easy 
fix is likely to have a new bool like 'DeltaPreparer::_may_have_deltas' which 
can be set to true in all of the spots where deleted_, reinserted_, and 
updates_by_col_ are modified. We can use that to implement MayHaveDeltas easily 
and to short-circuit the clearing of updates_by_col_ for the common case of no 
deltas.

> Optimize DeltaMemStore for case of no matching deltas
> -
>
> Key: KUDU-2381
> URL: https://issues.apache.org/jira/browse/KUDU-2381
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, tablet
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> Currently in a scan workload which scans 280 columns I see DeltaMemStore 
> iteration taking up a significant amount of CPU in the scan, despite the fact 
> that the dataset has no updates. Of 1.6sec in 
> MaterializingIterator::NextBlock, we spent 0.61s in DMSIterator::PrepareBatch 
> and 0.14s in DMSIterator::MayHaveDeltas. So, about 46% of our time here is on 
> wasted work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2381) Optimize DeltaMemStore for case of no matching deltas

2018-03-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415901#comment-16415901
 ] 

Todd Lipcon commented on KUDU-2381:
---

Looking at perf report it seems like the majority of CPU is in two spots:

In MayHaveDeltas:
{code}
  for (auto& col: updates_by_col_) {
if (!col.empty()) {
  return true;
}
  }
{code}

{code}
  for (UpdatesForColumn& ufc : updates_by_col_) {
ufc.clear();
  }
{code}

Both of these end up being no-ops in the case that there are no updates in the 
previously-scanned blocks. So, I think it would make sense to be more lazy 
about initializing updates_by_col_.

> Optimize DeltaMemStore for case of no matching deltas
> -
>
> Key: KUDU-2381
> URL: https://issues.apache.org/jira/browse/KUDU-2381
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, tablet
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> Currently in a scan workload which scans 280 columns I see DeltaMemStore 
> iteration taking up a significant amount of CPU in the scan, despite the fact 
> that the dataset has no updates. Of 1.6sec in 
> MaterializingIterator::NextBlock, we spent 0.61s in DMSIterator::PrepareBatch 
> and 0.14s in DMSIterator::MayHaveDeltas. So, about 46% of our time here is on 
> wasted work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)