Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/11575 )
Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet reader ...................................................................... Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-parquet-scanner.h@662 PS5, Line 662: ParquetColumnReade nit: typo http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-scan-node-base.cc@635 PS5, Line 635: DCHECK(scanner->get() == nullptr); DCHECK_EQ http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py File tests/util/parse_util.py: http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@130 PS5, Line 130: _ By convention we (and Python) often uses a leading _ for private methods. Now that it is used in other modules you should remove it. http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@137 PS5, Line 137: Please put an example of a string here that this regex matches. You might even put an assert regex_...matches(example) there. http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@138 PS5, Line 138: regex_summary_stat = r"\(Avg:.*?\((?P<avg>[0-9]*[.]*[0-9]*)\) ; " \ Use re.compile, then wrap the RE in """ to not have to use backslashes. This page has more details: https://docs.python.org/2/library/re.html#re.VERBOSE http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@147 PS5, Line 147: for counter in re.findall(counter_name + ".*", runtime_profile): Will that match newlines, too? http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@148 PS5, Line 148: summary_stat = list(re.finditer(regex_summary_stat, counter)) If you do this line-wise, why would there be more than one match? http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@149 PS5, Line 149: all nit: at all? http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@152 PS5, Line 152: > flake8: E203 whitespace before ':' You can also create a dict using dict(avg=0, min=0, max=0, samples=0) (might be a bit more concise). You can also add an assert to make sure the counter is actually empty. -- To view, visit http://gerrit.cloudera.org:8080/11575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817 Gerrit-Change-Number: 11575 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Thu, 08 Nov 2018 03:10:44 +0000 Gerrit-HasComments: Yes
