Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9693 )
Change subject: IMPALA-5842: Write page index in Parquet files ...................................................................... Patch Set 5: (16 comments) Thanks Csaba! (and I also updated my vimrc... :) ) http://gerrit.cloudera.org:8080/#/c/9693/4/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: http://gerrit.cloudera.org:8080/#/c/9693/4/be/src/exec/parquet-column-stats.h@159 PS4, Line 159: // If true, min/max values are ascending. > nit: missing . Done http://gerrit.cloudera.org:8080/#/c/9693/4/be/src/exec/parquet-column-stats.h@162 PS4, Line 162: // if true min/max values are descending. > nit: missing . Done http://gerrit.cloudera.org:8080/#/c/9693/4/be/src/exec/parquet-column-stats.inline.h File be/src/exec/parquet-column-stats.inline.h: http://gerrit.cloudera.org:8080/#/c/9693/4/be/src/exec/parquet-column-stats.inline.h@207 PS4, Line 207: if (prev_page_min_buffer_.IsEmpty()) { : RETURN_IF_ERROR(CopyToBuffer(&prev_page_min_buffer_, &prev_page_min_value_)); > nit: long lines Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py File tests/query_test/test_parquet_page_index.py: http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@71 PS4, Line 71: column_index_length) > nit: not enough indentation Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@82 PS4, Line 82: offset_index_length) > nit: not enough indentation Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@85 PS4, Line 85: page_header = get_page_header(parquet_file, page_loc.offset, : page_loc.compressed_page_size) : page_headers.append(page_header) > nit: too much indentation Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@113 PS4, Line 113: for null_page, value in zip(null_pages, values): > Are these prints intentional? No, jus left them here. http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@118 PS4, Line 118: elif ordering == BoundaryOrder.DESCENDING and previouse_value is not None: > Is this print intentional? Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@120 PS4, Line 120: previous_value = current_value > nit: too much indentation Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@122 PS4, Line 122: > nit: too much indentation Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@164 PS4, Line 164: assert null_page : # Everything is None, no further checks needed. : return : : min_value = decode_stats_value(column_info.schema, min_value_str) : for null_p > nit: too much indentation Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@200 PS4, Line 200: ).format(qualified_table_name, sorting_column, source_table) > nit: not enough indentation Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@203 PS4, Line 203: self._validate_parquet_index(hdfs_path, sorting_column, tmpdir) > nit: not enough indentation Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@244 PS4, Line 244: > nit: missing . Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/query_test/test_parquet_page_index.py@249 PS4, Line 249: self._ctas_table_and_verify_index(vector, unique_database, > nit: missing . Done http://gerrit.cloudera.org:8080/#/c/9693/4/tests/util/get_parquet_metadata.py File tests/util/get_parquet_metadata.py: http://gerrit.cloudera.org:8080/#/c/9693/4/tests/util/get_parquet_metadata.py@169 PS4, Line 169: def get_column_index(filename, file_pos, length): : """Reads a ColumnIndex object from a file at the given position.""" : serialized_column_index = read_serialized_object(filename, file_pos, length) : protocol = create_protocol(serialized_column_index) : column_index = ColumnIndex() : column_index.read(protocol) : return column_index : : : def get_page_header(filename, file_pos, length): : """Reads a PageHeader object from a file at the given position.""" : serialized_page_header = read_serialized_object(filename, file_pos, length) : protocol = create_protocol(serialized_page_header) : page_header = PageHeader() : page_header.read(protocol) : return page_header : : : def get_offset_index(filename, file_pos, length): : """Reads an OffsetIndex object from a file at the given position.""" : serialized_offset_index = read_serialized_object(filename, file_pos, length) : protocol = create_protocol(serialized_offset_index) : offset_index = OffsetIndex() : offset_index.read(protocol) : return offset_index > nit: too much indentation Done -- To view, visit http://gerrit.cloudera.org:8080/9693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9 Gerrit-Change-Number: 9693 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 26 Mar 2018 17:48:47 +0000 Gerrit-HasComments: Yes
