Hey Ananth, Thanks for posting this, and for working on the Kudu sink for Apex.
One thing I wanted to note in the article: "Kudu output operator allows the client side timestamps to be propagated to the Kudu server where the mutation is executed. This allows for out of sequence data tuples to be ordered on the server side. The following snippet of code in the upstream operator shows how this can be done." I think your understanding of the setPropagatedTimestamp() call is not quite right. This timestamp propagation serves as a lower-bound for the assigned timestamp at the server side, not as an exact setting of the server side timestamp. Thus, if you perform two inserts, and the second insert has a lower propagated timestamp, it does _not_ ensure that the first one takes precedence. Since the Propagated Timestamp is a lower-bound, the second insert will still be assigned a higher timestamp than the first. The purpose of this advanced API is to allow causal ordering to be maintained between two writes. For example, imagine that client A writes data from machine A, and then communicates with client B on machine B. Then, client B performs a write. If we want to ensure that B's write is assigned a higher timestamp than A, the setPropagatedTimestamp() API can ensure that (by setting A's write's timestamp as the lower bound for B's write). But, it can't be used to back-date a write as the article seems to be implying. Otherwise, the post is great! Thanks again for sharing your experience and application. -Todd On Tue, May 30, 2017 at 11:33 AM, Ananth G <[email protected]> wrote: > Hello All, > > Apache apex now enables low latency high throughput writes to Kudu as a > sink. More details on this on the atrato blog here: http://www.atrato.io/ > blog/2017/05/28/apex-kudu-output/ . Please use the comments section to > provide any feedback. > > Regards, > Ananth > -- Todd Lipcon Software Engineer, Cloudera
