Hi, I have a hive table partitioned date wise and hour wise. Data is coming in every hour our 2 hours in the table. I am using Griffin to perform certain profiling and accuracy checks on the data. However, I want only the new data that was accumulated post the last job run to be processed. The job is scheduled to run every hour. Right now, Griffin is picking up all the data present in the hive table (new data accumulated in past hour + past data already processed by the Griffin job previously). I believe there should be some configurations while creating a measure and job to avoid this scenario and process only the data acquired in the last hour. I have tried various permutation and combinations but have not been successful. Can someone please tell me the list of steps and configurations in UI that I need to ensure in order to achieve the desired result? Any help is much appreciated.
Regards, Vikram
