Hi, I have directory in HDFS containing 20 files with 150 Million records .
I just want random 20 million records from that directory . (Sampled Data ). I see that there are few implementations are there in flink https://github.com/eBay/Flink/tree/master/flink-java/src/main/java/org/apache/flink/api/java/sampling . Can someone provide code example to use these . Here is my code to read from HDFS file : final org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat<LongWritable, Text> inputFormat = HadoopInputs.readHadoopFile(new TextInputFormat(), LongWritable.class, Text.class, hdfsPath); final DataSource<Tuple2<LongWritable, Text>> input = environment.createInput(inputFormat).withParameters(configs); -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/