Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12004 )

Change subject: IMPALA-7889: Write new logical types in Parquet
......................................................................


Patch Set 5:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/hdfs-parquet-table-writer.cc@973
PS5, Line 973:     parquet::SchemaElement& node = file_metadata_.schema[i + 1];
Maybe choose a better name here?


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.h
File be/src/exec/parquet/parquet-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.h@62
PS5, Line 62:   static void FillSchemaElement(parquet::SchemaElement& node, 
const ColumnType& type,
We usually put input parameters first, then output, i.e. move node to the end. 
By convention we also pass output parameters by pointer.


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@73
PS5, Line 73: };
This is actually the end of the anonymous namespace and confused me. Please 
remove the semicolon and add a comment  // anonymous namespace.


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@78
PS5, Line 78: bool IsSupportedType(PrimitiveType impala_type,
This method should be in the anonymous namespace


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@153
PS5, Line 153: static bool IsEncodingSupported(parquet::Encoding::type e) {
This can be in the anonymous namespace, too


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@283
PS5, Line 283: void SetIntLogicalType(parquet::SchemaElement& node, int 
bitwidth) {
Move to the anonymous namespace. Pass output parameters by pointer and move to 
the end of the parameter list. Also add a brief function comment.


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@292
PS5, Line 292: node
Can you think of a better name for the schema element than "node"?


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@297
PS5, Line 297:   if (type.type == TYPE_DECIMAL) {
Should this be a case statement instead?


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@310
PS5, Line 310:   else if (type.type == TYPE_VARCHAR || type.type == TYPE_CHAR ||
Move else to previous line.


http://gerrit.cloudera.org:8080/#/c/12004/5/be/src/exec/parquet/parquet-metadata-utils.cc@311
PS5, Line 311:       (type.type == TYPE_STRING && 
query_options.parquet_annotate_strings_utf8)) {
Please add a brief comment that VARCHAR and CHAR are always UTF8 annotated, but 
STRING is only when set by the query option (I had to look it up).


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@362
PS5, Line 362:   def _ctas_and_get_metadata(self, vector, unique_database, 
tmp_dir, source_table):
Add function comment (here and for all other new functions


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@376
PS5, Line 376:     file_metadata_list = 
get_parquet_metadata_from_hdfs_folder(hdfs_path, tmp_dir)
Maybe you can clean up some of the other uses of os.walk() in this file?


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@388
PS5, Line 388:     schema = self._find_schema(schemas, column_name)
You could add

  assert schema is not None

You could also add that in _find_schema, since it seems required that the 
schema exists.


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@394
PS5, Line 394:     bit_width_to_converted_type_map = {8: 15, 16: 16, 32: 17, 
64: 18}
Can you replace the values with the names from parquet.thrift?


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@397
PS5, Line 397:     assert schema.logicalType.INTEGER is not None
The other types have to be None, right? If so, maybe create a helper 
_check_only_one_type_is_set(the_type) and then make sure that all others are 
None.


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/query_test/test_insert_parquet.py@432
PS5, Line 432: if
nit: once


http://gerrit.cloudera.org:8080/#/c/12004/5/tests/util/get_parquet_metadata.py
File tests/util/get_parquet_metadata.py:

http://gerrit.cloudera.org:8080/#/c/12004/5/tests/util/get_parquet_metadata.py@181
PS5, Line 181:     for f in files:
Add (as some uses of this in test_insert_parquet.py do):

 if not f.endswith('parq'):
  continue



--
To view, visit http://gerrit.cloudera.org:8080/12004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6f377950845683ab9c6dea79f4c54db0359d0b91
Gerrit-Change-Number: 12004
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Comment-Date: Fri, 30 Nov 2018 05:10:39 +0000
Gerrit-HasComments: Yes

Reply via email to