Martin, There's also the work of a former PhD student in our Dept:
http://arxiv.org/pdf/1007.1032.pdf Matias On 24/09/2014 1:16 AM, Martin Maechler wrote:
Rolf Turner <[email protected]> on Wed, 24 Sep 2014 18:43:34 +1200 writes:> On 24/09/14 17:31, Mohan Radhakrishnan wrote: >> Hi, >> >> I have streaming data(1 TB) that can't fit in memory. Is >> there a way for me to find the median of these streaming >> integers assuming I can fit only a small part in memory ? >> This is about the statistical approach to find the median >> of a large number of values when I can inspect only a >> part of them due to memory constraints. > You cannot, I'm pretty sure, calculate the median > recursively. However there are "approximate" recursive > median algorithms which provide an estimate of location > that has the same asymptotic properties as the median. > See: > * U. Holst, Recursive estimators of location. > Commun. Statist. Theory Meth., vol. 16, 1987, > pp. 2201--2226. > and > * Murray A. Cameron and T. Rolf Turner, Recursive location > and scale estimators, Commun. Statist. Theory Meth., > vol. 22, 1993, pp. 2503--2515. This is really interesting to me, thank you, Rolf! OTOH, 1) has your proposal ever been provided in R? I'd be happy to add it to the robustX (http://cran.ch.r-project.org/web/packages/robustX) or even robustbase (http://cran.ch.r-project.org/web/packages/robustbase) package. 2) Would anybody know of more recent research on the subject? (I quickly "googled around" and found research more geared for the time series situation which is more involved anyway) --> Hence CC'ing the experts' list R-SIG-robust Martin Maechler, ETH Zurich > cheers, > Rolf Turner > -- > Rolf Turner Technical Editor ANZJS _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust
_______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust
