[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has submitted this change and it was merged.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


IMPALA-3823: Add timer to measure Parquet footer reads

It's been observed that Parquet footer reads perform poorly especially
when reading from S3. This patch adds a timer "FooterProcessingTimer"
which keeps a track of the average time each split of each scan node
spends in reading and processing the parquet footer.

Added a new utility counter called SummaryStatsCounter which keeps
track of the min, max and average values seen so far from a set of
values. This counter is used to calculate the min, max and average
time taken to scan and process Parquet footers per query per node.

The RuntimeProfile has also been updated to keep a track of, display
and serialize this new counter to thrift.

BE tests have been added to verify that this counter works fine.

Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Reviewed-on: http://gerrit.cloudera.org:8080/4371
Reviewed-by: Sailesh Mukil 
Tested-by: Marcel Kornacker 
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/RuntimeProfile.thrift
M tests/query_test/test_scanners.py
8 files changed, 303 insertions(+), 16 deletions(-)

Approvals:
  Marcel Kornacker: Verified
  Sailesh Mukil: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 11: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-28 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 11: Code-Review+2

(1 comment)

Carry +2

http://gerrit.cloudera.org:8080/#/c/4371/10/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

Line 209:   SummaryStatsCounter(TUnit::type unit, int32_t total_num_values,
> I'm not sure what this comment means -- this is a constructor, so how can t
Oops, that was meant for SetStats(). I put that here by mistake. I've removed 
this line here now.


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-28 Thread Sailesh Mukil (Code Review)
Hello Henry Robinson, Matthew Jacobs, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4371

to look at the new patch set (#11).

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..

IMPALA-3823: Add timer to measure Parquet footer reads

It's been observed that Parquet footer reads perform poorly especially
when reading from S3. This patch adds a timer "FooterProcessingTimer"
which keeps a track of the average time each split of each scan node
spends in reading and processing the parquet footer.

Added a new utility counter called SummaryStatsCounter which keeps
track of the min, max and average values seen so far from a set of
values. This counter is used to calculate the min, max and average
time taken to scan and process Parquet footers per query per node.

The RuntimeProfile has also been updated to keep a track of, display
and serialize this new counter to thrift.

BE tests have been added to verify that this counter works fine.

Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/RuntimeProfile.thrift
M tests/query_test/test_scanners.py
8 files changed, 303 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 10: Code-Review+1

(3 comments)

Sorry for the delay. I've made the changes.
Carry +1

http://gerrit.cloudera.org:8080/#/c/4371/9/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS9, Line 256: // Protects min
> comment on what this protects
Done


http://gerrit.cloudera.org:8080/#/c/4371/9/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS9, Line 1019:   lock_guard 
> Was there a reason not to take lock_ here?
No, that was a mistake. Added it now.


http://gerrit.cloudera.org:8080/#/c/4371/9/common/thrift/RuntimeProfile.thrift
File common/thrift/RuntimeProfile.thrift:

PS9, Line 94: summary stats counter
> update
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-26 Thread Sailesh Mukil (Code Review)
Hello Henry Robinson, Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4371

to look at the new patch set (#10).

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..

IMPALA-3823: Add timer to measure Parquet footer reads

It's been observed that Parquet footer reads perform poorly especially
when reading from S3. This patch adds a timer "FooterProcessingTimer"
which keeps a track of the average time each split of each scan node
spends in reading and processing the parquet footer.

Added a new utility counter called SummaryStatsCounter which keeps
track of the min, max and average values seen so far from a set of
values. This counter is used to calculate the min, max and average
time taken to scan and process Parquet footers per query per node.

The RuntimeProfile has also been updated to keep a track of, display
and serialize this new counter to thrift.

BE tests have been added to verify that this counter works fine.

Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/RuntimeProfile.thrift
M tests/query_test/test_scanners.py
8 files changed, 304 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-18 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4371/9/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS9, Line 256: SpinLock lock_;
comment on what this protects


http://gerrit.cloudera.org:8080/#/c/4371/9/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS9, Line 1019:   counter->name = name;
Was there a reason not to take lock_ here?


http://gerrit.cloudera.org:8080/#/c/4371/9/common/thrift/RuntimeProfile.thrift
File common/thrift/RuntimeProfile.thrift:

PS9, Line 94: min max average value
update


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-18 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4371/8/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS8, Line 204: Contrary to 
> "Unlike"
Done


PS8, Line 204: a 
> remove
Done


PS8, Line 206: The average is stored in the 'value_' member of the base class.
> Referring to member variables in a class description is usually a sign the 
Yes, I've changed it to say value() instead.


Line 209:   SummaryStatsCounter(TUnit::type unit, int32_t total_num_values,
> Comment what this constructor is used for (is it when merging two SSCounter
It's for replacing the existing counter with new values.


PS8, Line 216: total_num_values
> can total_num_values be 0?
I don't think it happens in practice, but I've made a change to handle that 
case too.


PS8, Line 247: This keeps track of t
> Remove - just "The total..." is sufficient.
Done


PS8, Line 255: SpinLock counter_lock_;
> any reason not to call this lock_ ?
Nope, changed it.


http://gerrit.cloudera.org:8080/#/c/4371/8/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS8, Line 664: /* Print all SummaryStatsCounters as following:
 : : (Avg:  ; Min:  ; Max:  ;
 : Number of samples: ) */
> Comment style?
Done


PS8, Line 669: are
> is
Done


PS8, Line 1019: { min_ = new_value; }
> remove parentheses for single-line if statements.
Done


http://gerrit.cloudera.org:8080/#/c/4371/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS8, Line 317: \
> remove
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-18 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 9: Code-Review+1

Carry +1.

-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-18 Thread Sailesh Mukil (Code Review)
Hello Henry Robinson, Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4371

to look at the new patch set (#9).

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..

IMPALA-3823: Add timer to measure Parquet footer reads

It's been observed that Parquet footer reads perform poorly especially
when reading from S3. This patch adds a timer "FooterProcessingTimer"
which keeps a track of the average time each split of each scan node
spends in reading and processing the parquet footer.

Added a new utility counter called SummaryStatsCounter which keeps
track of the min, max and average values seen so far from a set of
values. This counter is used to calculate the min, max and average
time taken to scan and process Parquet footers per query per node.

The RuntimeProfile has also been updated to keep a track of, display
and serialize this new counter to thrift.

BE tests have been added to verify that this counter works fine.

Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/RuntimeProfile.thrift
M tests/query_test/test_scanners.py
8 files changed, 302 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-18 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 8:

I can look at this once you address Henry's comments.

-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-10-17 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 8: Code-Review+1

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4371/8/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS8, Line 204: Contrary to 
"Unlike"


PS8, Line 204: a 
remove


PS8, Line 206: The average is stored in the 'value_' member of the base class.
Referring to member variables in a class description is usually a sign the text 
is leaking implementation details. Do you mean to say that value() returns the 
average?


Line 209:   SummaryStatsCounter(TUnit::type unit, int32_t total_num_values,
Comment what this constructor is used for (is it when merging two SSCounters 
together?)


PS8, Line 216: total_num_values
can total_num_values be 0?


PS8, Line 247: This keeps track of t
Remove - just "The total..." is sufficient.


PS8, Line 255: SpinLock counter_lock_;
any reason not to call this lock_ ?


http://gerrit.cloudera.org:8080/#/c/4371/8/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS8, Line 664: /* Print all SummaryStatsCounters as following:
 : : (Avg:  ; Min:  ; Max:  ;
 : Number of samples: ) */
Comment style?


PS8, Line 669: are
is


PS8, Line 1019: { min_ = new_value; }
remove parentheses for single-line if statements.


http://gerrit.cloudera.org:8080/#/c/4371/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS8, Line 317: \
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-09-26 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4371/6/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

Line 228:   int32_t total_num_values() const { return total_num_values_; }
> Since it's only a read, I figured even if the CPU reorders the read instruc
Without reader synchronization you can probably have the following sequence:

Previously, counter.min = counter.max = 0

Writer:
  counter.min = V
  counter.max = V

Reader:
  mn = counter->min_value() // == V
  mx = counter->max_value() // == 0
  DCHECK_LE(mn, mx) << "Fails"

So you would need to synchronize with the writer to ensure that an update 
appears atomic. 

TSO helps you avoid certain kinds of ordering bugs - but remember that others 
are trying to port this to PPC. You have, in 
RuntimeProfile::SummaryStatsCounter::UpdateCounter():

  ++total_num_values_;
  sum_ += new_value;

If you did an unsynchronized read of sum and total_num_values on a non-TSO 
architecture, you could easily get total_num_values == 0 but sum > 0.


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-09-15 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 6: Code-Review+1

(3 comments)

Carry +1

http://gerrit.cloudera.org:8080/#/c/4371/5/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS5, Line 1020:   if (new_value < min_) { min_ = new_value; }
  :   if (new_value > max_) { max_ = new_value; }
  : }
> 1 line
Done


PS5, Line 1023: 
  : void RuntimeProfile::SummaryStatsCounter::SetStats(const 
TSummaryStatsCounter& counter) {
  :   l
> 1 line
Done


http://gerrit.cloudera.org:8080/#/c/4371/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 338: # and average values as the same since we have only one sample 
(i.e. only one range)
> can you add a TODO for handling the case when there is more than one range 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-09-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#5).

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..

IMPALA-3823: Add timer to measure Parquet footer reads

It's been observed that Parquet footer reads perform poorly especially
when reading from S3. This patch adds a timer "FooterProcessingTimer"
which keeps a track of the average time each split of each scan node
spends in reading and processing the parquet footer.

Added a new utility counter called SummaryStatsCounter which keeps
track of the min, max and average values seen so far from a set of
values. This counter is used to calculate the min, max and average
time taken to scan and process Parquet footers per query per node.

The RuntimeProfile has also been updated to keep a track of, display
and serialize this new counter to thrift.

BE tests have been added to verify that this counter works fine.

Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/RuntimeProfile.thrift
M tests/query_test/test_scanners.py
8 files changed, 288 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-09-12 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4371/3/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

PS3, Line 430: avg_timer_
let's update the variable names too. e.g. something like 
process_footer_timer_stats_ or such


http://gerrit.cloudera.org:8080/#/c/4371/2/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS2, Line 240: t TSummaryStatsCount
> Yes that's right. The only reason I kept it as a Counter instead of a value
Thanks

I think it's more useful to remove the restriction. Even in 
hdfs-parquet-scanner where you're using this, there is a Counter constructed 
just for it's timer, it's not included in a profile. That code might as well 
have just used a timing function explicitly. (I'm not saying it's worth 
changing that code, just that there's no dependency on this taking Counters.)

Also, I don't think we get much from checking the Counter's unit. We don't 
really set counter units carefully in many cases (e.g. we use UNIT all over the 
place instead of anything useful), and ultimately it's up to the user of this 
interface to make sure what they're providing here makes sense.


http://gerrit.cloudera.org:8080/#/c/4371/3/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS3, Line 240: Overwrite
"SetStats()" to be more consistent with some of the other fns, e.g. 
SetSamples(). If I saw Overwrite() I'd wonder why there wasn't just a copy 
constructor or something like that.


http://gerrit.cloudera.org:8080/#/c/4371/2/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 661:   }
> Done
thanks!


http://gerrit.cloudera.org:8080/#/c/4371/3/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS3, Line 668: - 
I don't see this dash in the example above? Is it intentional? I see the event 
markers under an event sequence make a list, though those are items of a 
specific timeline. I don't see why that'd be the case here.

Also value() is an average, right? I think it'd be good to have that specified.


It'd be nice to include the number of samples as well. Also, avg isn't really 
all that different from the other stats, so maybe just print them all the same 
way, e.g.:

" :  avg: , min: , max: , num_samples: "


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes