Hi Do, In DataStream you can always implement your own sampling function, hopefully without too much effort.
Adding such functionality it to the API could be a good idea. But given that in sampling there is no “one-size-fits-all” solution (as not every use case needs random sampling and not all random samplers fit to all workloads), I am not sure if we should start adding different sampling operators. Thanks, Kostas > On Jul 9, 2016, at 5:43 PM, Greg Hogan <c...@greghogan.com> wrote: > > Hi Do, > > DataSet provides a stable @Public interface. DataSetUtils is marked > @PublicEvolving which is intended for public use, has stable behavior, but > method signatures may change. It's also good to limit DataSet to common > methods whereas the utility methods tend to be used for specific > applications. > > I don't have the pulse of streaming but this sounds like a useful feature > that could be added. > > Greg > > On Sat, Jul 9, 2016 at 10:47 AM, Le Quoc Do <lequo...@gmail.com> wrote: > >> Hi all, >> >> I'm working on approximate computing using sampling techniques. I >> recognized that Flink supports the sample function for Dataset >> (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering why >> you didn't merge the function to org/apache/flink/api/java/DataSet.java >> since the sample function works as a transformation operator? >> >> The second question is that are you planning to support the sample >> function for DataStream (within windows) since I did not see it in >> DataStream code ? >> >> Thank you, >> Do >>