Hello,

Thank you for the feedback. Indeed, it might be better to keep the 
TimestampColumnStatsData, if we want to extend it with nanosecond precision in 
the future.

I wanted to see which consequences removing the field would have on existing 
Hive instances. However, it turned out to be more difficult than expected to 
get it running. There was a ClassCastException when the timestamp statistic was 
about to be extracted from the pipeline that calculates the statistics.

Kind regards,
Thomas

On 2/8/26 03:18, Shohei Okumiya wrote:
Hello,

I'm writing this message without taking a deep look at HIVE-22311 and
HIVE-29398, so I've not been on the same page yet. Now that Hive and
Iceberg support nanosecond timestamps, I suppose a timestamp-specific
data structure will make sense in the future. Is it unrealistic to ask
the Impala or other projects to update their implementation instead?

Best,
Okumin

On Wed, Feb 4, 2026 at 7:55 PM Stamatis Zampetakis <[email protected]> wrote:
Hello,

Not sure if this is only about other projects that are using
metastore. If we change the storage for timestamps aren't we gonna
break even existing deployments of Hive in the next upgrade?

I am including the user@ list since this is not a pure dev discussion
but can also impact existing users.

Best,
Stamatis

On Wed, Feb 4, 2026 at 11:22 AM Thomas Rebele <[email protected]> wrote:
Hi Hive community,

I'm working on HIVE-29398 to make the Hive metastore more compatible with other 
projects that use it (e.g., Impala). In 2019, HIVE-22311 (Propagate min/max 
column values from statistics to the optimizer for timestamp type) had 
introduced a struct TimestampColumnStatsData in the thrift definition. It seems 
that this change to the thrift code was not necessary, as the timestamp 
statistics can be passed via the existing LongColumnStatsData as well. Impala 
actually expects the statistics that way. I had worked on a property to switch 
back to the old behavior.

In the review of the PR, Krisztian Kasa suggested to ask the community, whether 
it would be possible to undo the change of HIVE-22311. A while ago I prepared a 
patch to undo the changes to the thrift code, while still keeping the benefits 
of propagating the stats to the optimizer, so it is possible. I'm quite new to 
Hive, so I don't know much about the consequences of removing a field from the 
thrift code. Is it actually advisable to remove the timestampStats field from 
hive_metastore.thrift? Are there other projects that started to use the 
timestamp stats field? In the case we decide to drop the field, those projects 
would need to use the long field instead.

Best regards,
Thomas Rebele

Reply via email to