Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads

Patch Set 4:


I was hoping to get to +1 this pass but there's a bit more still...

Most importantly there's a bug in computing max and I think we need unit tests.
File be/src/exec/

PS4, Line 196: RuntimeProfile::Counter 
this is unnecessary, SCOPED_TIMER below just creates a ScopedTimer which 
creates a MonotonicStopWatch and adds the result of it to this counter, so you 
might as well just use MonotonicStopWatch directly.
File be/src/util/runtime-profile-counters.h:

please rename


PS4, Line 147: .
can you add:
This is used to aggregate child counters.

PS4, Line 204: /// Contrary to the AveragedCounter, this only keeps a track of 
values and not 'Counter'
             : /// objects themselves.
this is a bit misleading because it doesn't really keep values, it keeps the 
statistics about the values.

How about:
Contrary to the AveragedCoutner, this keeps statistics about raw values whereas 
AveragedCounter is used for averaging child counters.

PS4, Line 249:  /// Current sums of values seen so far and min/max values seen 
so far.
             :   int64_t current_min_;
             :   int64_t current_max_;
             :   int64_t current_sum_;
current is unnecessary
File be/src/util/

PS4, Line 301:       const TSummaryStatsCounter& c = 
             :       SummaryStatsCounterMap::iterator it = 
             :       if (it == summary_stats_map_.end()) {
             :         summary_stats_map_[] =
             :             pool_->Add(new SummaryStatsCounter(
             :                 c.unit, c.total_num_values, c.min_value, 
c.max_value, c.sum));
             :       } else {
             :         it->second->SetStats(c);
             :       }
it's hard to know if this gets exercised in your test case.

Can you add a test in that sets up a stats counter and 
goes through a merge?

PS4, Line 667:  
super nit: extra space

Line 668:     for (const SummaryStatsCounterMap::value_type& v: 
summary_stats_map_) {
I think we need a separate case to handle when total_num_values is 0 because 
avg/min/max don't make sense on 0 samples, plus, min and max will print out 
INT_MAX/INT_MIN which is an implementation detail.

I think this would benefit from a test case in as well

PS4, Line 669: (
this brace isn't closed at the end

PS4, Line 676: v.second->unit()
this isn't in the same Counter's units (which is time for the use case you 
added), this is just a raw count. I think you should just print this 
total_num_values without pretty printer.

PS4, Line 1017:  else if
this isn't an else if, you need to execute both branches

this will be wrong when there is just 1 sample.
File tests/query_test/

PS4, Line 336: MinMaxAvgValueCounter

PS4, Line 337: averages the time taken to
             :     # scan and process the Parquet footer across scan ranges and 
also records the min and
             :     # the max time.
keeps statistics for the time taken to scan and Process the Parquet footer 
across scan ranges

PS4, Line 340:     if ranges_per_node > 1:
             :       num_equal = 0
             :       for min_max_time in footer_processing_time_list:
             :         num_equal += (min_max_time[0] == min_max_time[1])
             :       # The chance that the times taken by different scan ranges 
to process the footer,
             :       # are equal, is very unlikely. However, it could happen 
which could make the min and
             :       # max time to process a footer equal. Assert that at a 
time, no more than 1 node has
             :       # equal min and max time.
             :       assert num_equal < 2
following up with my previous comment, I still think this is weird and was 
hiding at least 1 bug (the max wasn't set when there was only 1 sample). I'm 
not really sure what this verification gets us.

It's fine if you can't compare it to the total time, but please at least check 
that if ranges_per_node is 1 or you see that the number of samples is 1, then 
min = max = avg != 0. That's easy since you don't have to do any string parsing.

That said, I do think we should have some tests that do some better validation, 
e.g. unit tests in Then we don't need to worry about 
parsing times here and we get some more clear coverage.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Sailesh Mukil <>
Gerrit-HasComments: Yes

Reply via email to