[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Marcel Kornacker has submitted this change and it was merged. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called SummaryStatsCounter which keeps track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. The RuntimeProfile has also been updated to keep a track of, display and serialize this new counter to thrift. BE tests have been added to verify that this counter works fine. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Reviewed-on: http://gerrit.cloudera.org:8080/4371 Reviewed-by: Sailesh MukilTested-by: Marcel Kornacker --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 8 files changed, 303 insertions(+), 16 deletions(-) Approvals: Marcel Kornacker: Verified Sailesh Mukil: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 11: Code-Review+2 (1 comment) Carry +2 http://gerrit.cloudera.org:8080/#/c/4371/10/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: Line 209: SummaryStatsCounter(TUnit::type unit, int32_t total_num_values, > I'm not sure what this comment means -- this is a constructor, so how can t Oops, that was meant for SetStats(). I put that here by mistake. I've removed this line here now. -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Hello Henry Robinson, Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4371 to look at the new patch set (#11). Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called SummaryStatsCounter which keeps track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. The RuntimeProfile has also been updated to keep a track of, display and serialize this new counter to thrift. BE tests have been added to verify that this counter works fine. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 8 files changed, 303 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/11 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 10: Code-Review+1 (3 comments) Sorry for the delay. I've made the changes. Carry +1 http://gerrit.cloudera.org:8080/#/c/4371/9/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: PS9, Line 256: // Protects min > comment on what this protects Done http://gerrit.cloudera.org:8080/#/c/4371/9/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS9, Line 1019: lock_guard > Was there a reason not to take lock_ here? No, that was a mistake. Added it now. http://gerrit.cloudera.org:8080/#/c/4371/9/common/thrift/RuntimeProfile.thrift File common/thrift/RuntimeProfile.thrift: PS9, Line 94: summary stats counter > update Done -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Hello Henry Robinson, Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4371 to look at the new patch set (#10). Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called SummaryStatsCounter which keeps track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. The RuntimeProfile has also been updated to keep a track of, display and serialize this new counter to thrift. BE tests have been added to verify that this counter works fine. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 8 files changed, 304 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/10 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/4371/9/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: PS9, Line 256: SpinLock lock_; comment on what this protects http://gerrit.cloudera.org:8080/#/c/4371/9/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS9, Line 1019: counter->name = name; Was there a reason not to take lock_ here? http://gerrit.cloudera.org:8080/#/c/4371/9/common/thrift/RuntimeProfile.thrift File common/thrift/RuntimeProfile.thrift: PS9, Line 94: min max average value update -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 8: (11 comments) http://gerrit.cloudera.org:8080/#/c/4371/8/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: PS8, Line 204: Contrary to > "Unlike" Done PS8, Line 204: a > remove Done PS8, Line 206: The average is stored in the 'value_' member of the base class. > Referring to member variables in a class description is usually a sign the Yes, I've changed it to say value() instead. Line 209: SummaryStatsCounter(TUnit::type unit, int32_t total_num_values, > Comment what this constructor is used for (is it when merging two SSCounter It's for replacing the existing counter with new values. PS8, Line 216: total_num_values > can total_num_values be 0? I don't think it happens in practice, but I've made a change to handle that case too. PS8, Line 247: This keeps track of t > Remove - just "The total..." is sufficient. Done PS8, Line 255: SpinLock counter_lock_; > any reason not to call this lock_ ? Nope, changed it. http://gerrit.cloudera.org:8080/#/c/4371/8/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS8, Line 664: /* Print all SummaryStatsCounters as following: : : (Avg: ; Min: ; Max: ; : Number of samples: ) */ > Comment style? Done PS8, Line 669: are > is Done PS8, Line 1019: { min_ = new_value; } > remove parentheses for single-line if statements. Done http://gerrit.cloudera.org:8080/#/c/4371/8/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS8, Line 317: \ > remove Done -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 9: Code-Review+1 Carry +1. -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Hello Henry Robinson, Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4371 to look at the new patch set (#9). Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called SummaryStatsCounter which keeps track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. The RuntimeProfile has also been updated to keep a track of, display and serialize this new counter to thrift. BE tests have been added to verify that this counter works fine. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 8 files changed, 302 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/9 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Dan Hecht has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 8: I can look at this once you address Henry's comments. -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Henry Robinson has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 8: Code-Review+1 (11 comments) http://gerrit.cloudera.org:8080/#/c/4371/8/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: PS8, Line 204: Contrary to "Unlike" PS8, Line 204: a remove PS8, Line 206: The average is stored in the 'value_' member of the base class. Referring to member variables in a class description is usually a sign the text is leaking implementation details. Do you mean to say that value() returns the average? Line 209: SummaryStatsCounter(TUnit::type unit, int32_t total_num_values, Comment what this constructor is used for (is it when merging two SSCounters together?) PS8, Line 216: total_num_values can total_num_values be 0? PS8, Line 247: This keeps track of t Remove - just "The total..." is sufficient. PS8, Line 255: SpinLock counter_lock_; any reason not to call this lock_ ? http://gerrit.cloudera.org:8080/#/c/4371/8/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS8, Line 664: /* Print all SummaryStatsCounters as following: : : (Avg: ; Min: ; Max: ; : Number of samples: ) */ Comment style? PS8, Line 669: are is PS8, Line 1019: { min_ = new_value; } remove parentheses for single-line if statements. http://gerrit.cloudera.org:8080/#/c/4371/8/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS8, Line 317: \ remove -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Henry Robinson has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4371/6/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: Line 228: int32_t total_num_values() const { return total_num_values_; } > Since it's only a read, I figured even if the CPU reorders the read instruc Without reader synchronization you can probably have the following sequence: Previously, counter.min = counter.max = 0 Writer: counter.min = V counter.max = V Reader: mn = counter->min_value() // == V mx = counter->max_value() // == 0 DCHECK_LE(mn, mx) << "Fails" So you would need to synchronize with the writer to ensure that an update appears atomic. TSO helps you avoid certain kinds of ordering bugs - but remember that others are trying to port this to PPC. You have, in RuntimeProfile::SummaryStatsCounter::UpdateCounter(): ++total_num_values_; sum_ += new_value; If you did an unsynchronized read of sum and total_num_values on a non-TSO architecture, you could easily get total_num_values == 0 but sum > 0. -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 6: Code-Review+1 (3 comments) Carry +1 http://gerrit.cloudera.org:8080/#/c/4371/5/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS5, Line 1020: if (new_value < min_) { min_ = new_value; } : if (new_value > max_) { max_ = new_value; } : } > 1 line Done PS5, Line 1023: : void RuntimeProfile::SummaryStatsCounter::SetStats(const TSummaryStatsCounter& counter) { : l > 1 line Done http://gerrit.cloudera.org:8080/#/c/4371/5/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 338: # and average values as the same since we have only one sample (i.e. only one range) > can you add a TODO for handling the case when there is more than one range Done -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has uploaded a new patch set (#5). Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called SummaryStatsCounter which keeps track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. The RuntimeProfile has also been updated to keep a track of, display and serialize this new counter to thrift. BE tests have been added to verify that this counter works fine. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 8 files changed, 288 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/5 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/4371/3/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: PS3, Line 430: avg_timer_ let's update the variable names too. e.g. something like process_footer_timer_stats_ or such http://gerrit.cloudera.org:8080/#/c/4371/2/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: PS2, Line 240: t TSummaryStatsCount > Yes that's right. The only reason I kept it as a Counter instead of a value Thanks I think it's more useful to remove the restriction. Even in hdfs-parquet-scanner where you're using this, there is a Counter constructed just for it's timer, it's not included in a profile. That code might as well have just used a timing function explicitly. (I'm not saying it's worth changing that code, just that there's no dependency on this taking Counters.) Also, I don't think we get much from checking the Counter's unit. We don't really set counter units carefully in many cases (e.g. we use UNIT all over the place instead of anything useful), and ultimately it's up to the user of this interface to make sure what they're providing here makes sense. http://gerrit.cloudera.org:8080/#/c/4371/3/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: PS3, Line 240: Overwrite "SetStats()" to be more consistent with some of the other fns, e.g. SetSamples(). If I saw Overwrite() I'd wonder why there wasn't just a copy constructor or something like that. http://gerrit.cloudera.org:8080/#/c/4371/2/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: Line 661: } > Done thanks! http://gerrit.cloudera.org:8080/#/c/4371/3/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS3, Line 668: - I don't see this dash in the example above? Is it intentional? I see the event markers under an event sequence make a list, though those are items of a specific timeline. I don't see why that'd be the case here. Also value() is an average, right? I think it'd be good to have that specified. It'd be nice to include the number of samples as well. Also, avg isn't really all that different from the other stats, so maybe just print them all the same way, e.g.: " : avg: , min: , max: , num_samples: " -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes