Hi everybody!
Perhaps the following papers are relevant to the discussion here
(their contact authors have been cc'd):
1. The following proposes effective algorithms for using block-level
sampling for n_distinct estimation:
Effective use of block-level sampling in statistics
Actually, the earliest paper that solves the distinct_n estimation
problem in 1 pass is the following:
Estimating simple functions on the union of data streams
by Gibbons and Tirthapura, SPAA 2001.
http://home.eng.iastate.edu/~snt/research/streaming.pdf
The above paper addresses