Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11575 )

Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet 
reader
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-parquet-scanner.h@662
PS5, Line 662: ParquetColumnReade
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/11575/5/be/src/exec/hdfs-scan-node-base.cc@635
PS5, Line 635:   DCHECK(scanner->get() == nullptr);
> DCHECK_EQ
Changing to DCHECK_EQ gives me an compile time error:

error: ambiguous overload for ‘operator<<’ (operand types are ‘std::ostream 
{aka std::basic_ostream<char>}’ and ‘std::nullptr_t’)

Sounds like DCHECK_EQ requires calling the << operator on scanner->get(), but 
the scanner doesn't have a resolvable << operator.

I didn't actually modify this code. I rebased the patch which is why Gerrit 
pulled in this diff.


http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py
File tests/util/parse_util.py:

http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@130
PS5, Line 130: _
> By convention we (and Python) often uses a leading _ for private methods. N
Done


http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@137
PS5, Line 137:
> Please put an example of a string here that this regex matches. You might e
Done


http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@138
PS5, Line 138:   regex_summary_stat = r"\(Avg:.*?\((?P<avg>[0-9]*[.]*[0-9]*)\) 
; " \
> Use re.compile, then wrap the RE in """ to not have to use backslashes. Thi
Done


http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@147
PS5, Line 147:   for counter in re.findall(counter_name + ".*", 
runtime_profile):
> Will that match newlines, too?
It shouldn't. The dot character matches all characters except for newlines, 
unless the re.DOTALL parameter is specified - 
https://docs.python.org/2/library/re.html#re.DOTALL


http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@148
PS5, Line 148:     summary_stat = list(re.finditer(regex_summary_stat, counter))
> If you do this line-wise, why would there be more than one match?
Replaced with re.search


http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@149
PS5, Line 149: all
> nit: at all?
Done


http://gerrit.cloudera.org:8080/#/c/11575/5/tests/util/parse_util.py@152
PS5, Line 152:
> You can also create a dict using dict(avg=0, min=0, max=0, samples=0) (migh
Done



--
To view, visit http://gerrit.cloudera.org:8080/11575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817
Gerrit-Change-Number: 11575
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Fri, 09 Nov 2018 19:05:37 +0000
Gerrit-HasComments: Yes

Reply via email to