Zoltan Ivanfi has uploaded a new change for review. http://gerrit.cloudera.org:8080/4835
Change subject: Remove seemingly incorrect DCHECK-s. ...................................................................... Remove seemingly incorrect DCHECK-s. The first conditional DCHECK means that if a page's size is 0 then it's compressed size is also 0. This, however, seems to be a false assumption, as the compressed output may include metadata, such as length or checksum. The GZIP compressor, for example, states that an input of 0 bytes requires 23 bytes when compressed. The Snappy compressor also mentions storing length information in the compressed output. The compressed size estimation in the LZ4 compressor also contains a constant part. The "Last page might be empty" comment and the second DCHECK also seems to be based on a false assumption. If a value doesn't fit on the current page, AppendRow creates a possible bigger new page and tries writing the data in the new page instead. This means that if the data is bigger than the page size, then the current page is finalized and a new page is added, even if the original page was empty. In other words, empty pages can occur in the middle of the pages_ array as well, not only at the end of it. Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76 --- M be/src/exec/hdfs-parquet-table-writer.cc 1 file changed, 1 insertion(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/4835/1 -- To view, visit http://gerrit.cloudera.org:8080/4835 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I52e1b1354e9ea056b49331e75e53759952a81b76 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zoltan Ivanfi <[email protected]>
