Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/108
  
    I tested `each_top_k`, `to_ordered_map` and `to_ordered_list` on the same 
MovieLens 1M data. As we expected, `to_ordered_map` collects duplicated keys, 
and the number of ratings is 3 while we launched top-10 aggregation.
    
    ```sql
    with topk as (
        select
            each_top_k(
                10, userid, rating,
                userid, movieid
            ) as (rank, rating, userid, movieid)
        from (
            select
                userid, movieid, rating
            from ratings
            cluster by userid
        ) t
    )
    select 
        count(1), collect_list(array(movieid, rating))
    from 
        topk 
    where 
        userid = 1
    ;
    ```
    
    > 10      
[[527.0,5.0],[3105.0,5.0],[1270.0,5.0],[48.0,5.0],[1035.0,5.0],[1193.0,5.0],[1287.0,5.0],[2355.0,5.0],[595.0,5.0],[2804.0,5.0]]
    
    ```sql
    with topk as (
        select 
            userid, 
            to_ordered_map(rating, movieid, 10) as movies
        from
            ratings
        group by 
            userid
    )
    select 
        count(1), collect_list(array(movieid, rating))
    from 
        topk
    lateral view explode(movies) t as rating, movieid
    where 
        userid = 1
    ;
    ```
    
    > 3       [[2028,5],[1246,4],[745,3]]
    
    ```sql
    with topk as (
        select 
            userid, 
            to_ordered_list(movieid, rating, '-k 10') as movies
        from
            ratings
        group by 
            userid
    )
    select 
        size(movies), movies
    from 
        topk
    where 
        userid = 1
    ;
    ```
    
    > 10      [595,1035,3105,2355,1287,2804,1193,2028,1029,1270]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to