Kuai Yu created GOBBLIN-615:
-------------------------------

             Summary: Make LWM==HWM a valid interval in QueryBaseSource
                 Key: GOBBLIN-615
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-615
             Project: Apache Gobblin
          Issue Type: Improvement
            Reporter: Kuai Yu
            Assignee: Kuai Yu


We have seen many issues in DateWatermark where the job intermittently failed 
every other day. The reason is as follows:
 # On 10-02 at 17:47 job pulls with logindate >= 2018-10-01 (HWM = 10-2, when 
job finished Actual_HWM is 10/2)
 # On same 10-02 date, if the job repulled, we would have LWM=10-3, HWM=10-2, 
the job would fail as expected.
 # On 10-03 at 17:47 job fails to generate any workunits because now LWM = 
Actual_HWM + 1 = 10-3, HWM = 10-3. According to DateWatermark::getIntervals(), 
the startTime must be less than endTime to generate an interval.
 # On 10-04 at 17:47 job recovered because LWM keeps as 10-3 and HWM = 10-4, so 
a valid interval is generated again.

The fix here is to let DateWatermark generate an interval at step 3, so that we 
won't have an intermittent failure in step 3.

However this fix will cause another problem. Today we could have missing data 
in step 1 and 4, because step 1 pulls data for 10/2 too early and step 4 pulls 
data for 10/4 too early, but at least step 3 pulls whole data for 10/3. After 
this fix, the 10/3 will be pulled too early as well. So that this fix needs to 
be working with Cutoff feature so that we will only pull 10-1's data on 10/2.

Thanks,

Kuai



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to