Hello, I'v trying to solve an ETL problem using Hive wherein a partition in a Hive table needs to be restated on account of delayed data. This means a new version of an already existing partition needs to be introduced to the table. I need to do this while serving queries on that table which could be reading the previous version of the partition. The intended behaviour is to allow current running queries finish with reading previous partition version and new queries pick up the new partition version. The data is in Parquet, which shouldn't really affect the implementation. Moving directories causes MR/Tez jobs that are reading it to fail.
Have folks had experience with such a use case? Are there things in Hive I can leverage instead of having to implement the ETL myself? One approach i'm looking at is to never move partition directories. Only introduce new directories as new versions of the partition and point the table partition location to this new directory. Any currently running query would continue reading from previous version directory since that was not moved from it's original location. thanks, -Gautam.