guys, This is my suggestion. Use Spark SQL instead of Impala from Hive tables to get correct timestamp values all the time. The situation is explained below:
I have come across a situation where a multi-tenant cluster is being used to read and write to Parquet file. This causes some issues as I understand when Hive stores a timestamp into Parquet format, it converts local time into UTC time, and when it reads data out, it converts back to local time. Impala, however, on the other hand does not do any conversion when it reads the timestamp column from Parquet file so the UTC time is returned instead of local time. so there are multiple issues: Data read by impala is not converted from UTC to local time A flag can be set to make impala convert at the cluster level only a group is saying they don't want to o the conversion at the application level So it will cure certain problems but make other tenants less happy with the conversion. now my understanding is that this issue comes about because impala bypasses hive metadata and goes directly to Parquet files. there is an impact to business. my suggestion is that if they want performant reads they should use Spark SQL on Hive. it will always get the same values as stored by Hive Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.