Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11000 )

Change subject: IMPALA-5542: Impala cannot scan Parquet decimal stored as 
int64_t/int32_t
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-column-readers.cc@1536
PS1, Line 1536: DCHECK_EQ(sizeof(Decimal4Value::StorageType), 
slot_desc->type().GetByteSize());
> i dont remember where the file level column metadata validation occurs in c
I added the checks as there but leave these DCHECKs here as well if you don't 
mind.


http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@62
PS1, Line 62:   static int ByteSize(const InternalType& v) { return 
sizeof(InternalType); }
> This methods represents the encoded byte size if the internalType is writte
Yeah, I was also a bit confused about it.
So, should I leave it as it is?


http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@182
PS1, Line 182:   /// Decodes t from 'buffer', reading up to the byte before 
'buffer_end'. 'buffer'
             :   /// need not be aligned. If PARQUET_TYPE is 
FIXED_LEN_BYTE_ARRAY then 'fixed_len_size'
             :   /// is the size of the object. Otherwise, it is unused.
             :   /// Returns the number of bytes read or -1 if the value was 
not decoded successfully.
> I think it will be super helpful to list which types are decoded by this te
I agree, I added a table based on SUPPORTED_PHYSICAL_TYPES in 
parquet-metadata-utils.cc, the Decode() functions in parquet-common.h, and the 
switch statement in HdfsParquetTableWriter::Init().


http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@208
PS1, Line 208: / disallow it.
             : template <> int ParquetPlainEncoder::ByteSize(const ColumnType
> a bit misleading, this implies that impala only supports this type for test
True, I think I was a bit confused when I wrote these comments.


http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-metadata-utils.cc@195
PS1, Line 195:  if (slot_desc->type().type == TYPE_DECIMAL) {
> here it is! you should add checks here to make sure int32 and int64 encoded
Done.


http://gerrit.cloudera.org:8080/#/c/11000/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/11000/1/tests/query_test/test_scanners.py@293
PS1, Line 293:   def _create_table_from_file(self, table_name, unique_database):
> I have created a related ticket in the past:
I'll do this refactor in an other commit.


http://gerrit.cloudera.org:8080/#/c/11000/1/tests/query_test/test_scanners.py@298
PS1, Line 298:     hdfs_file = 
'/test-warehouse/{0}.db/{1}'.format(unique_database, filename)
             :     check_call(['hdfs', 'dfs', '-copyFromLocal', '-f', 
local_file, hdfs_file])
> I am a bit worried about using /test-warehouse/. Your file names are unlike
I chose the first option.



--
To view, visit http://gerrit.cloudera.org:8080/11000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794
Gerrit-Change-Number: 11000
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 31 Jul 2018 15:35:04 +0000
Gerrit-HasComments: Yes

Reply via email to