[computer-go] OT: median of a data stream
Apologies for the off-topic post, but I know lots of people here are interested in statistics and algorithms. Calculating the mean of a stream of numbers [1] is easy: just keep track of the sum and the count, and divide at the end. But what about the median? I think I always need to buffer at least half the numbers, but does anyone know for sure, or know a clever algorithm [2]? Darren [1]: In a desperate attempt at on-topic-ness lets pretend they are thinking times for each move in all games in a large go game collection. [2]: If I know the number of distinct numbers is relatively small I can make the histogram and derive the median that way. E.g. 100 distinct numbers means I just need 100 counters. I also get the mode as a bonus. -- Darren Cook http://dcook.org/mlsn/ (English-Japanese-German-Chinese free dictionary) http://dcook.org/work/ (About me and my work) http://dcook.org/work/charts/ (My flash charting demos) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] OT: median of a data stream
On 8/16/07, Darren Cook [EMAIL PROTECTED] wrote: Apologies for the off-topic post, but I know lots of people here are interested in statistics and algorithms. Calculating the mean of a stream of numbers [1] is easy: just keep track of the sum and the count, and divide at the end. But what about the median? I think I always need to buffer at least half the numbers, but does anyone know for sure, or know a clever algorithm [2]? You'll have to remember all numbers. If you've seen a_0 a_1 a_2 ... a_n so far, then for any i, 0=i=n, you'll have to ouput a_i after an additional (n-i) values less than a_0, and i values greater than a_n. regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] OT: median of a data stream
On 8/16/07, Darren Cook [EMAIL PROTECTED] wrote: Calculating the mean of a stream of numbers [1] is easy: just keep track of the sum and the count, and divide at the end. But what about the median? I think I always need to buffer at least half the numbers A counter for each unique value should always suffice and may be less than half the numbers in the stream. If you don't require huge precision you can save a lot by increasing the bin-size (just maintain a histogram). , but does anyone know for sure, or know a clever algorithm [2]? Not sure, but I guess you could look for an incremental update of any probability density approximation that has sufficient complexity for your purposes. An even more simplistic approach might be to compare incoming samples to the current estimate and increment/decrement by a small constant depending on whether it's higher or lower (but that may be too crude for your purposes). Best, Erik ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] OT: median of a data stream
You can use a binary tree. In each node you store the number of child nodes in the left subtree. (maybe also for the right subtree) If you want to handle duplicates, you could add yet another counter to the nodes. Finding the median is easy. As a bonus, you can also find the value for *any* rank or percentile. Cost is 2 or3 pointers plus 2 or 3 ints per node, say 20 bytes. If your input is reasonably random, balancing will not be needed. Avoiding dynamic allocation will probably speed things up. (and make deallocation easyer ;-) HTH, AvK ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/