Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11000 )
Change subject: IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t ...................................................................... Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-column-readers.cc@1536 PS1, Line 1536: DCHECK_EQ(sizeof(Decimal4Value::StorageType), slot_desc->type().GetByteSize()); > i dont remember where the file level column metadata validation occurs in c I added the checks as there but leave these DCHECKs here as well if you don't mind. http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@62 PS1, Line 62: static int ByteSize(const InternalType& v) { return sizeof(InternalType); } > This methods represents the encoded byte size if the internalType is writte Yeah, I was also a bit confused about it. So, should I leave it as it is? http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@182 PS1, Line 182: /// Decodes t from 'buffer', reading up to the byte before 'buffer_end'. 'buffer' : /// need not be aligned. If PARQUET_TYPE is FIXED_LEN_BYTE_ARRAY then 'fixed_len_size' : /// is the size of the object. Otherwise, it is unused. : /// Returns the number of bytes read or -1 if the value was not decoded successfully. > I think it will be super helpful to list which types are decoded by this te I agree, I added a table based on SUPPORTED_PHYSICAL_TYPES in parquet-metadata-utils.cc, the Decode() functions in parquet-common.h, and the switch statement in HdfsParquetTableWriter::Init(). http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-common.h@208 PS1, Line 208: / disallow it. : template <> int ParquetPlainEncoder::ByteSize(const ColumnType > a bit misleading, this implies that impala only supports this type for test True, I think I was a bit confused when I wrote these comments. http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/11000/1/be/src/exec/parquet-metadata-utils.cc@195 PS1, Line 195: if (slot_desc->type().type == TYPE_DECIMAL) { > here it is! you should add checks here to make sure int32 and int64 encoded Done. http://gerrit.cloudera.org:8080/#/c/11000/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/11000/1/tests/query_test/test_scanners.py@293 PS1, Line 293: def _create_table_from_file(self, table_name, unique_database): > I have created a related ticket in the past: I'll do this refactor in an other commit. http://gerrit.cloudera.org:8080/#/c/11000/1/tests/query_test/test_scanners.py@298 PS1, Line 298: hdfs_file = '/test-warehouse/{0}.db/{1}'.format(unique_database, filename) : check_call(['hdfs', 'dfs', '-copyFromLocal', '-f', local_file, hdfs_file]) > I am a bit worried about using /test-warehouse/. Your file names are unlike I chose the first option. -- To view, visit http://gerrit.cloudera.org:8080/11000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794 Gerrit-Change-Number: 11000 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 31 Jul 2018 15:35:04 +0000 Gerrit-HasComments: Yes
