Re: Send ACK when all records of file are processed

2018-01-30 Thread Piotr Nowojski
In case of reading from input files, at the EOF event, readers will send Watermark(Long.MAX_VALUE) on all of the output edges and those watermarks will be propagated accordingly. So your ACK operator will get Watermark(Long.MAX_VALUE) only when it gets it from ALL of it’s input edges. When

Re: Send ACK when all records of file are processed

2018-01-25 Thread Piotr Nowojski
Hi, If an operator has multiple inputs, it’s watermark will be the minimum of all of the inputs. Thus your hypothetical “ACK Operator” will get Watermark(Long.MAX_VALUE) only when of the preceding operators report Watermark(Long.MAX_VALUE). Yes, instead of simply adding sink, you would have

Re: Send ACK when all records of file are processed

2018-01-25 Thread Vinay Patil
Hi Piotrek, Thank you for your detailed answer. Yes, I want to generate the ack when all the records of the file are written to DB. So to understand what you are saying , we will receive a single EOF watermark value at the ack operator when all the downstream operator process all the records of

Re: Send ACK when all records of file are processed

2018-01-25 Thread Piotr Nowojski
Hi, As you figured out, some dummy EOF record is one solution, however you might try to achieve it also by wrapping an existing CSV function. Your wrapper could emit this dummy EOF record. Another (probably better) idea is to use Watermark(Long.MAX_VALUE) for the EOF marker. Stream source

Send ACK when all records of file are processed

2018-01-24 Thread Vinay Patil
Hi Guys, Following is how my pipeline looks (DataStream API) : [1] Read the data from the csv file [2] KeyBy it by some id [3] Do the enrichment and write it to DB [1] reads the data in sequence as it has single parallelism and then I have default parallelism for the other operators. I want to