Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-27 Thread Gurmeet Manku
Hi everybody! Perhaps the following papers are relevant to the discussion here (their contact authors have been cc'd): 1. The following proposes effective algorithms for using block-level sampling for n_distinct estimation: Effective use of block-level sampling in statistics

[HACKERS] Citation for Bad n_distinct estimation; hacks suggested?

2005-05-02 Thread Gurmeet Manku
Actually, the earliest paper that solves the distinct_n estimation problem in 1 pass is the following: Estimating simple functions on the union of data streams by Gibbons and Tirthapura, SPAA 2001. http://home.eng.iastate.edu/~snt/research/streaming.pdf The above paper addresses