[R] R-help Time Series
Hi Folks, While I was browsing in the R-help archives yesterday, I got curious about the time series of the sizes of the monthly archives in MB. This turned out to have an unexpected feature or two, which I leave to readers to explore for themselves. I'm now wondering at what point in time we might expect to be receiving 1000MB/month (30+MB/day). It's not that far away, it seems, but there are a couple of interesting modelling questions behind it. In particular, I wonder by what mechanism the numbers grow, according to the law which the data seem to indicate. Over to you. (just my 0.001 MB worth ... excluding headers) Ted To save you the trouble, the following sets up the series: MB-c(55,19,19,18,19,17,35,27,47, 55,32,50,55,41,49,50,28,53,42,81,54, 99,60,84,80,76,75,78,61,83,97,141,122, 96,144,173,153,226,202,131,165,183,175,168,187, 240,272,262,195,236,244,285,249,326,345,392,268, 455,320,418,453,468,422,447,400,323,516,478,327, 450,487,535,658,573,606,659,543,655,722,677,567, 519,703,886,793,719,816,812,730,698,831,969,736, 855) April 1997 -- January 2005 E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-05 Time: 10:31:05 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Time Series
I think 1) You have the units wrong: these appear to be the figures quoted for KB of compressed files, and the compression is nothing like 1024:1. 2) This is not `a series' unless you add a time base, e.g. via a call to ts(). Surely subscribers are aware that they do not get many MB/day and that extrapolation to that level is just speculation. On Sat, 26 Feb 2005 [EMAIL PROTECTED] wrote: Hi Folks, While I was browsing in the R-help archives yesterday, I got curious about the time series of the sizes of the monthly archives in MB. This turned out to have an unexpected feature or two, which I leave to readers to explore for themselves. I'm now wondering at what point in time we might expect to be receiving 1000MB/month (30+MB/day). It's not that far away, it seems, but there are a couple of interesting modelling questions behind it. In particular, I wonder by what mechanism the numbers grow, according to the law which the data seem to indicate. Over to you. (just my 0.001 MB worth ... excluding headers) Ted To save you the trouble, the following sets up the series: MB-c(55,19,19,18,19,17,35,27,47, 55,32,50,55,41,49,50,28,53,42,81,54, 99,60,84,80,76,75,78,61,83,97,141,122, 96,144,173,153,226,202,131,165,183,175,168,187, 240,272,262,195,236,244,285,249,326,345,392,268, 455,320,418,453,468,422,447,400,323,516,478,327, 450,487,535,658,573,606,659,543,655,722,677,567, 519,703,886,793,719,816,812,730,698,831,969,736, 855) April 1997 -- January 2005 E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-05 Time: 10:31:05 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Time Series
On 26-Feb-05 Prof Brian Ripley wrote: I think 1) You have the units wrong: these appear to be the figures quoted for KB of compressed files, and the compression is nothing like 1024:1. Sorry, yes, you are correct: it is KB and not MB (a slip of the eye on my part). 2) This is not `a series' unless you add a time base, e.g. via a call to ts(). Well, in R terms that is strictly correct; but a sequence of data corresponding to successive regular time points is usually described as a time series! Surely subscribers are aware that they do not get many MB/day and that extrapolation to that level is just speculation. Granted (see above). But anyway, this sort of thing is not the real point, which is that (regardless of units) this 'sequence' of data has interesting features (which prompted me to submit my somewhat tongue-in-cheek posting). The original (quoted below) now suitably amended. On Sat, 26 Feb 2005 [EMAIL PROTECTED] wrote: Hi Folks, While I was browsing in the R-help archives yesterday, I got curious about the time series of the sizes of the monthly archives in KB [was MB]. This turned out to have an unexpected feature or two, which I leave to readers to explore for themselves. I'm now wondering at what point in time we might expect to be receiving 1000KB/month (30+KB/day) [was MB]. It's not that far away, it seems, but there are a couple of interesting modelling questions behind it. In particular, I wonder by what mechanism the numbers grow, according to the law which the data seem to indicate. Over to you. (just my 0.001 MB worth ... excluding headers) Ted To save you the trouble, the following sets up the sequence [was series; and MB]: KB-c(55,19,19,18,19,17,35,27,47, 55,32,50,55,41,49,50,28,53,42,81,54, 99,60,84,80,76,75,78,61,83,97,141,122, 96,144,173,153,226,202,131,165,183,175,168,187, 240,272,262,195,236,244,285,249,326,345,392,268, 455,320,418,453,468,422,447,400,323,516,478,327, 450,487,535,658,573,606,659,543,655,722,677,567, 519,703,886,793,719,816,812,730,698,831,969,736, 855) April 1997 -- January 2005 E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-05 Time: 10:31:05 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-05 Time: 12:06:29 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Time Series
Ted.Harding at nessie.mcc.ac.uk writes: : : Hi Folks, : : While I was browsing in the R-help archives yesterday, : I got curious about the time series of the sizes of : the monthly archives in MB. : : This turned out to have an unexpected feature or two, : which I leave to readers to explore for themselves. : : I'm now wondering at what point in time we might expect : to be receiving 1000MB/month (30+MB/day). It's not that : far away, it seems, but there are a couple of interesting : modelling questions behind it. : : In particular, I wonder by what mechanism the numbers : grow, according to the law which the data seem to indicate. : : Over to you. : : (just my 0.001 MB worth ... excluding headers) : : Ted : : To save you the trouble, the following sets up the series: : : MB-c(55,19,19,18,19,17,35,27,47, : 55,32,50,55,41,49,50,28,53,42,81,54, : 99,60,84,80,76,75,78,61,83,97,141,122, : 96,144,173,153,226,202,131,165,183,175,168,187, : 240,272,262,195,236,244,285,249,326,345,392,268, : 455,320,418,453,468,422,447,400,323,516,478,327, : 450,487,535,658,573,606,659,543,655,722,677,567, : 519,703,886,793,719,816,812,730,698,831,969,736, : 855) : : April 1997 -- January 2005 There were some discussions on this about a year ago: http://tolstoy.newcastle.edu.au/R/help/04/04/1071.html http://tolstoy.newcastle.edu.au/R/help/04/04/1095.html http://tolstoy.newcastle.edu.au/R/help/04/04/1109.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Time Series
Prof Brian Ripley wrote: I think 1) You have the units wrong: these appear to be the figures quoted for KB of compressed files, and the compression is nothing like 1024:1. 2) This is not `a series' unless you add a time base, e.g. via a call to ts(). Surely subscribers are aware that they do not get many MB/day and that extrapolation to that level is just speculation. So let's be immensely unfair and do some speculation ... Assuming 1000MB/month means a compressed archive file of (very) *roughly* 250MB. Looking at the data with linear models, lm(sqrt(MB) ~ monthindex) seems not to be the worst model (removing the first observation, perhaps). So a very *rough* extrapolation shows us that we will get 1000MB/month around 2142. I hope I'll get a better machine in 137 years to handle all the expected traffic. ;-) Best, Uwe On Sat, 26 Feb 2005 [EMAIL PROTECTED] wrote: Hi Folks, While I was browsing in the R-help archives yesterday, I got curious about the time series of the sizes of the monthly archives in MB. This turned out to have an unexpected feature or two, which I leave to readers to explore for themselves. I'm now wondering at what point in time we might expect to be receiving 1000MB/month (30+MB/day). It's not that far away, it seems, but there are a couple of interesting modelling questions behind it. In particular, I wonder by what mechanism the numbers grow, according to the law which the data seem to indicate. Over to you. (just my 0.001 MB worth ... excluding headers) Ted To save you the trouble, the following sets up the series: MB-c(55,19,19,18,19,17,35,27,47, 55,32,50,55,41,49,50,28,53,42,81,54, 99,60,84,80,76,75,78,61,83,97,141,122, 96,144,173,153,226,202,131,165,183,175,168,187, 240,272,262,195,236,244,285,249,326,345,392,268, 455,320,418,453,468,422,447,400,323,516,478,327, 450,487,535,658,573,606,659,543,655,722,677,567, 519,703,886,793,719,816,812,730,698,831,969,736, 855) April 1997 -- January 2005 E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-05 Time: 10:31:05 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Time Series
On 26-Feb-05 Uwe Ligges wrote: So let's be immensely unfair and do some speculation ... Assuming 1000MB/month means a compressed archive file of (very) *roughly* 250MB. Looking at the data with linear models, lm(sqrt(MB) ~ monthindex) seems not to be the worst model (removing the first observation, perhaps). Well, Uwe, disregarding the confusion over units, your model (and caveat) coincides with mine, leading to KB = (2.448731+0.282701*T + noise)^2 where T is in months and the first observation is at T=1. (Actually, I think the initial burn-in might be a bit longer, say over 3-4 months, and there is a slight suggestion that the growth has been slightly flattening out recently). And what intrigued me is the question: what mechanism might lead to a quadratic growth law? One possible interpretation is the following: Suppose that the number of postings is proportional to the number of R users. A quadratic has first difference linear in T. So the average number of additional postings per month can be seen as a sum of two components: a) a constant kernel b) a component proportional to the increment in postings in the previous month. Interpreting this as number of users, it could suggest that recruitment to R could be due to two causes: recruitment by a core of fixed size, and recruitment by recent recruits! Of course this is far from the only possibility. Another might be that postings to R-help reflect the number of issues that users are concerned to get help or information on. This might reflect: a) a core of die-hard FAQs asked by more and more people; b) a growing corpus of packages which more and more people need guidance with. And so on. An essential missing piece of data (where I'm concerned) is the sequence of numbers of subscribers to R-help. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Feb-05 Time: 14:00:46 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html