Spark Structured streaming - dropDuplicates with watermark

2018-12-03 Thread Nirmal Manoharan
I am trying to deduplicate on streaming data using the dropDuplicate function with watermark. The problem I am facing currently is that I have to two timestamps for a given record 1. One is the eventtimestamp - timestamp of the record creation from the source 2. Another is an transfer timestamp - t

Re: How to remove empty strings from JavaRDD

2016-04-07 Thread Nirmal Manoharan
Hi Greg, I use something similar to this in my application but not for empty string. So the below example is not tested but it should work. JavaRDD filteredJavaRDD = example.filter(new Function(){ public Boolean call(String arg0) throws Exception { return (!arg0.equals("")); } }); Thanks! Nirmal