Hi everybody!

 Perhaps the following papers are relevant to the discussion here
 (their contact authors have been cc'd):

 1. The following proposes effective algorithms for using block-level 
    sampling for n_distinct estimation:

 "Effective use of block-level sampling in statistics estimation"
 by Chaudhuri, Das and Srivastava, SIGMOD 2004.


 2. In a single scan, it is possible to estimate n_distinct by using
    a very simple algorithm:

 "Distinct sampling for highly-accurate answers to distinct value
  queries and event reports" by Gibbons, VLDB 2001.


 3. In fact, Gibbon's basic idea has been extended to "sliding windows" 
    (this extension is useful in streaming systems like Aurora / Stream):

 "Distributed streams algorithms for sliding windows"
 by Gibbons and Tirthapura, SPAA 2002.



 Gurmeet Singh Manku                      Google Inc.
 http://www.cs.stanford.edu/~manku    (650) 967 1890

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to