Hi to all, I was looking for an approx_count and freq_item in Flink and I'm not sure which road to follow. At the moment I found 2 valuable options:
1. Wait for STREAMLINE to unveil their code of HLL_DISTINCT_COUNT[1] 2. Use the Yahoo Datasketches lib [2], following the example of Tobias Lindener [3][4] (and maybe release a better and reusable third party lib for Flink) What do you advice about it? Is there any other ongoing effort on approx statistics? Best, Flavio [1] https://h2020-streamline-project.eu/wp-content/uploads/2018/10/Streamline-D5.5-Final.pdf [2] https://datasketches.github.io [3]https://github.com/tlindener/ApproximateQueries/ [4] https://www.slideshare.net/SeattleApacheFlinkMeetup/approximate-queries-and-graph-streams-on-apache-flink-theodore-vasiloudis-seattle-apache-flink-meetup