Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/19769#discussion_r151557718
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
---
@@ -298,7 +304,10 @@ private void decodeDictionaryIds(
// TODO: Convert dictionary of Binaries to dictionary of Longs
if (!column.isNullAt(i)) {
Binary v =
dictionary.decodeToBinary(dictionaryIds.getDictId(i));
- column.putLong(i,
ParquetRowConverter.binaryToSQLTimestamp(v));
+ long rawTime = ParquetRowConverter.binaryToSQLTimestamp(v);
+ long adjTime =
+ convertTz == null ? rawTime :
DateTimeUtils.convertTz(rawTime, convertTz, UTC);
+ column.putLong(i, adjTime);
--- End diff --
oh good point. I suppose to get test coverage for this, I'd have to try
to generate a parquet file without dictionary encoding from impala ...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]