I am trying to deduplicate on streaming data using the dropDuplicate
function with watermark. The problem I am facing currently is that I have
to two timestamps for a given record
1. One is the eventtimestamp - timestamp of the record creation from the
source
2. Another is an transfer timestamp - t
Hi Greg,
I use something similar to this in my application but not for empty string.
So the below example is not tested but it should work.
JavaRDD filteredJavaRDD = example.filter(new
Function(){
public Boolean call(String arg0) throws Exception {
return (!arg0.equals(""));
}
});
Thanks!
Nirmal