Hi community, Top-N query is result-updating, according to [docs]( https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/topn/ ):
> The TopN query is *Result Updating*. Flink SQL will sort the input data stream according to the order key, so if the top N records have been changed, the changed ones will be sent as retraction/update records to downstream. It makes sense for streaming input. However, I feel it makes less sense for batch input. Why? Because the intermediate result of Top-N is not useful for a batch input. For example [here]( https://github.com/YikSanChan/pyflink-quickstart/blob/12b89341301c62072c4dfff4eecc39edad61ef9b/topk.py), I get a lot of intermediate output. For example, take a look at this input: ``` 1,100,1 1,101,3 2,201,4 2,200,2 2,202,5 2,203,6 1,102,7 1,103,8 1,104,9 2,204,10 ``` The first column being user_id, the second being movie_id, the third being score (generated by some ML model). I want to tell what the top 3 movies that each user_id should watch. Since the input is batch and finite, what I need is really just: 1 -> 102,103,104 2 -> 202,203,204 However, using this [PyFlink program]( https://github.com/YikSanChan/pyflink-quickstart/blob/12b89341301c62072c4dfff4eecc39edad61ef9b/topk.py), I get a ton of intermediate output: ``` +I(1,100) -U(1,100) +U(1,100,101) +I(2,201) -U(2,201) +U(2,201,200) -U(2,201,200) +U(2,201,200,202) -U(2,201,200,202) +U(2,201,202) -U(2,201,202) +U(2,201,202,203) -U(1,100,101) +U(1,100,101,102) -U(1,100,101,102) +U(1,101,102) -U(1,101,102) +U(1,101,102,103) -U(1,101,102,103) +U(1,102,103) -U(1,102,103) +U(1,102,103,104) -U(2,201,202,203) +U(2,202,203) -U(2,202,203) +U(2,202,203,204) ``` And since it is result-updating, I am not able to use a file as the sink. My question is: is there a way to generate top-n movies for each user, and make sure the output is not result-updating? Thank you! Best, Yik San