Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9693 )

Change subject: IMPALA-5842: Write page index in Parquet files
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@300
PS2, Line 300:   std::vector<bool> null_pages_;
> Tim:
Yeah, maybe we should combine the code into a shared utility function, since I 
think the core logic is the same.

IMO the sensible thing is to leave the max unset in that case.  Hopefully we 
can amend the spec to allow that.

I think technically now our choices are to drop the entire column index or 
write out a huge value. Dropping the index wouldn't be a total disaster, since 
this should be a rare edge case, but it is kind of crappy.


http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@1195
PS2, Line 1195:   // If the file contains more than one row_group, don't write 
the index.
I missed this earlier - this seems weird. Do we ever write more than one row 
group? Do we test it?


http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@1224
PS2, Line 1224: n
> I just realized that the min/max values are only strings at this point, at
I think I like the idea of determining asc/desc incrementally instead of doing 
an extra pass. Doing it in Merge() could make sense, but I think we would have 
to clearly document what that means, since Merge() would no longer be a 
commutative operation.



--
To view, visit http://gerrit.cloudera.org:8080/9693
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Gerrit-Change-Number: 9693
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 20 Mar 2018 22:21:36 +0000
Gerrit-HasComments: Yes

Reply via email to