Hi,
I have a hive table partitioned date wise and hour wise. Data is coming in 
every hour our 2 hours in the table. I am using Griffin to perform certain 
profiling and accuracy checks on the data.
However, I want only the new data that was accumulated post the last job run to 
be processed. The job is scheduled to run every hour.
Right now, Griffin is picking up all the data present in the hive table (new 
data accumulated in past hour + past data already processed by the Griffin job 
previously). I believe there should be some configurations while creating a 
measure and job to avoid this scenario and process only the data acquired in 
the last hour. I have tried various permutation and combinations but have not 
been successful.
Can someone please tell me the list of steps and configurations in UI that I 
need to ensure in order to achieve the desired result?
Any help is much appreciated.

Regards,
Vikram

Reply via email to