We have a scenario where we want to consume only delta updates from Hive tables. - Multiple producers are updating data in Hive table - Multiple consumer reading data from the Hive table
Consumption pattern: - Get all data that has been updated since last time I read. Is there any mechanism in Hive 3.0 which can enable above consumption pattern? I see there is a construct of row__id(writeid, bucketid, rowid) in ACID tables. - Can row__id be used in this scenario? - How is the "writeid" generated? - Is there some meta information which captures the time when the rows were actually visible for read?