Actually, the earliest paper that solves the distinct_n estimation problem in 1 pass is the following:
"Estimating simple functions on the union of data streams" by Gibbons and Tirthapura, SPAA 2001. http://home.eng.iastate.edu/~snt/research/streaming.pdf The above paper addresses a more difficult problem (1 pass _and_ a distributed setting). Gibbon's followup paper in VLDB 2001 limits the problem to a single machine and contains primarily experimental results (for a database audience). The algorithmic breakthrough had already been accomplished in the SPAA paper. Gurmeet -- ---------------------------------------------------- Gurmeet Singh Manku Google Inc. http://www.cs.stanford.edu/~manku (650) 967 1890 ---------------------------------------------------- ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])