Take a look at a distributed data structure server, for example Redis. The
are various Storm integrations available.

On Monday, July 14, 2014, 唐思成 <[email protected]> wrote:

>  Use case is simple, count unique user in for in a window slide, and I
> found the common solutions over the Internet is to use HashSet to fliter
> the duplicated user,like this
>
>  public class Distinct extends BaseFilter {
>     private static final long serialVersionUID = 1L;
>
>     private Set<String> distincter = Collections.synchronizedSet(new 
> HashSet<String>());
>      @Override
>     public boolean isKeep(TridentTuple tuple) {
>         String id = this.getId(tuple);
>         return distincter.add(id);
>     }
>      public String getId(TridentTuple t) {
>         StringBuilder sb = new StringBuilder();
>         for (int i = 0; i < t.size(); i++) {
>             sb.append(t.getString(i));
>         }
>         return sb.toString();
>     }
> }
>
> However, the HashSet is stored in memory, when the data grows to a very
> large level, I think it will cause a OOM.
> So is there a scalable solution?
>
> 2014-07-14
> ------------------------------
> 唐思成
>


-- 
Danijel Schiavuzzi

E: [email protected]
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Reply via email to