Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9693 )

Change subject: IMPALA-5842: Write page index in Parquet files
......................................................................


Patch Set 17:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/9693/16/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/16/be/src/exec/hdfs-parquet-table-writer.cc@769
PS16, Line 769:     if 
(UNLIKELY(!table_sink_mem_tracker_->TryConsume(new_memory_allocation))) {
> Need to increment this after the TryConsume() succeeds, otherwise we'll rel
Ouch, right. Done.


http://gerrit.cloudera.org:8080/#/c/9693/15/be/src/util/string-util-test.cc
File be/src/util/string-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/9693/15/be/src/util/string-util-test.cc@68
PS15, Line 68:   EvalTruncation(normal_plus_max, "abcdeg", 10, UP);
> Thanks for adding more tests. I meant the exact 3-character string consisti
I had a similar test, when I used the '127' literal.
I updated it to use your 3-character string.


http://gerrit.cloudera.org:8080/#/c/9693/16/be/src/util/string-util.cc
File be/src/util/string-util.cc:

http://gerrit.cloudera.org:8080/#/c/9693/16/be/src/util/string-util.cc@46
PS16, Line 46:   // We convert it to unsigned because signed overflow results 
in undefined behavior.
> I think this could use a comment and/or an explicit cast to make the intent
Added comment and cast.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@36
PS16, Line 36:
> update
Done


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@55
PS16, Line 55:
> nit: PEP8 says the closing """ should be on a new line. You can use flake8
Done


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@90
PS16, Line 90:       column_info = ColumnInfo(schema, stats, offset_index, 
column_index, page_headers)
> I think you should be able to just pass all the parameters to the ctor "Col
Done


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@96
PS16, Line 96: e_headers)
> "ColumnInfo objects" or "column infos"
"column infos", done.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@97
PS16, Line 97:
> the first row group?
Done


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@110
PS16, Line 110: sert previous_loc.offset < current
> You can zip a list with itself to create (prev, cur) pairs:
Nice! Thanks, done.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@124
PS16, Line 124: sert page
> nit: page_is_null might be more clear
Done


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@182
PS16, Line 182:       else:
> Would it make sense to validate the ordering instead of re-computing it? I.
Yeah, I think it's better to use a different algorithm with the same semantics. 
Rewrote this function.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@214
PS16, Line 214:
> unused?
Removed it.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@216
PS16, Line 216:  and that th
> Stale comment?
Yes, removed it.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@249
PS16, Line 249:     else:
> move closer to the comment explaining the intent
Done


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@321
PS16, Line 321:  tmp
> nit: end
Done


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/query_test/test_parquet_page_index.py@348
PS16, Line 348:
              :     column =
> I had missed this looking at the truncation code. Why do we keep the zeroes
Yeah, I don't think we need those. Changed the logic.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/util/get_parquet_metadata.py
File tests/util/get_parquet_metadata.py:

http://gerrit.cloudera.org:8080/#/c/9693/16/tests/util/get_parquet_metadata.py@31
PS16, Line 31: ""
             :   with open(file
> That does not actually seem to be a requirement of the function. Should we
Removed the last two sentences from the comment and renamed the method to 
'read_range'.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/util/get_parquet_metadata.py@39
PS16, Line 39:   contain a serialized thrift object."""
> Mention "serialized_object" in the comment, or rename the parameter to buff
Renamed the parameter to 'serialized_object_buffer'.


http://gerrit.cloudera.org:8080/#/c/9693/16/tests/util/get_parquet_metadata.py@171
PS16, Line 171: (serialized_thrift_obj
> Can read_serialized_object also create the protocol and return it? You coul
I like the idea since we won't duplicate code that much.



--
To view, visit http://gerrit.cloudera.org:8080/9693
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Gerrit-Change-Number: 9693
Gerrit-PatchSet: 17
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward #248
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 May 2018 16:58:38 +0000
Gerrit-HasComments: Yes

Reply via email to