Take a look at a distributed data structure server, for example Redis. The are various Storm integrations available.
On Monday, July 14, 2014, 唐思成 <[email protected]> wrote: > Use case is simple, count unique user in for in a window slide, and I > found the common solutions over the Internet is to use HashSet to fliter > the duplicated user,like this > > public class Distinct extends BaseFilter { > private static final long serialVersionUID = 1L; > > private Set<String> distincter = Collections.synchronizedSet(new > HashSet<String>()); > @Override > public boolean isKeep(TridentTuple tuple) { > String id = this.getId(tuple); > return distincter.add(id); > } > public String getId(TridentTuple t) { > StringBuilder sb = new StringBuilder(); > for (int i = 0; i < t.size(); i++) { > sb.append(t.getString(i)); > } > return sb.toString(); > } > } > > However, the HashSet is stored in memory, when the data grows to a very > large level, I think it will cause a OOM. > So is there a scalable solution? > > 2014-07-14 > ------------------------------ > 唐思成 > -- Danijel Schiavuzzi E: [email protected] W: www.schiavuzzi.com T: +385989035562 Skype: danijels7
