Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11575 )
Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet reader ...................................................................... Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-parquet-scanner.h@662 PS5, Line 662: ParquetColumnReade > nit: typo Done http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-scan-node-base.cc@635 PS5, Line 635: DCHECK(scanner->get() == nullptr); > DCHECK_EQ Changing to DCHECK_EQ gives me an compile time error: error: ambiguous overload for ‘operator<<’ (operand types are ‘std::ostream {aka std::basic_ostream<char>}’ and ‘std::nullptr_t’) Sounds like DCHECK_EQ requires calling the << operator on scanner->get(), but the scanner doesn't have a resolvable << operator. I didn't actually modify this code. I rebased the patch which is why Gerrit pulled in this diff. http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py File tests/util/parse_util.py: http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@130 PS5, Line 130: _ > By convention we (and Python) often uses a leading _ for private methods. N Done http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@137 PS5, Line 137: > Please put an example of a string here that this regex matches. You might e Done http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@138 PS5, Line 138: regex_summary_stat = r"\(Avg:.*?\((?P<avg>[0-9]*[.]*[0-9]*)\) ; " \ > Use re.compile, then wrap the RE in """ to not have to use backslashes. Thi Done http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@147 PS5, Line 147: for counter in re.findall(counter_name + ".*", runtime_profile): > Will that match newlines, too? It shouldn't. The dot character matches all characters except for newlines, unless the re.DOTALL parameter is specified - https://docs.python.org/2/library/re.html#re.DOTALL http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@148 PS5, Line 148: summary_stat = list(re.finditer(regex_summary_stat, counter)) > If you do this line-wise, why would there be more than one match? Replaced with re.search http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@149 PS5, Line 149: all > nit: at all? Done http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@152 PS5, Line 152: > You can also create a dict using dict(avg=0, min=0, max=0, samples=0) (migh Done -- To view, visit http://gerrit.cloudera.org:8080/11575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817 Gerrit-Change-Number: 11575 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Fri, 09 Nov 2018 19:05:37 +0000 Gerrit-HasComments: Yes
