-Sagar
On Sep 9, 2011, at 7:58 AM, Arvind Jayaprakash <[email protected]> wrote: > On Sep 06, sagar naik wrote: >> I can dedup based on timestamp of the event. >> Can I increment the counter value and assign the version as the timestamp of >> this event ? > > Is it because you have an infinitesimally fine grained timestamp, you > assume two events wont happen at the "same time" (as defined by > granularity of your clock) or just because the events are far and few? > > Also, are the events arriving in monotonically increasing order of time? Yes. > I assume that is not the case given that you talk of duplicates (it > would be a real crazy system if duplicates always arrive exactly one > after the other without any interleaving). It is simple case of MR. Jobs can fail and we will have to restart the job on same input. This can lead to dups Also, the pre-processor to counter insertion step may produce same events ( say bugs) and I want to make sure that even if that is the case, the events are not counted twice > > If the answer is no to either of the above solutions, then you need to > rethink a bit. Thanks again
