[R] R-help Time Series

2005-02-26 Thread Ted Harding
Hi Folks,

While I was browsing in the R-help archives yesterday,
I got curious about the time series of the sizes of
the monthly archives in MB.

This turned out to have an unexpected feature or two,
which I leave to readers to explore for themselves.

I'm now wondering at what point in time we might expect
to be receiving 1000MB/month (30+MB/day). It's not that
far away, it seems, but there are a couple of interesting
modelling questions behind it.

In particular, I wonder by what mechanism the numbers
grow, according to the law which the data seem to indicate.

Over to you.

(just my 0.001 MB worth ... excluding headers)

Ted

To save you the trouble, the following sets up the series:

MB-c(55,19,19,18,19,17,35,27,47,
55,32,50,55,41,49,50,28,53,42,81,54,
99,60,84,80,76,75,78,61,83,97,141,122,
96,144,173,153,226,202,131,165,183,175,168,187,
240,272,262,195,236,244,285,249,326,345,392,268,
455,320,418,453,468,422,447,400,323,516,478,327,
450,487,535,658,573,606,659,543,655,722,677,567,
519,703,886,793,719,816,812,730,698,831,969,736,
855)

April 1997 -- January 2005


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Feb-05   Time: 10:31:05
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R-help Time Series

2005-02-26 Thread Prof Brian Ripley
I think
1) You have the units wrong: these appear to be the figures quoted for KB 
of compressed files, and the compression is nothing like 1024:1.

2) This is not `a series' unless you add a time base, e.g. via a call 
to ts().

Surely subscribers are aware that they do not get many MB/day and that 
extrapolation to that level is just speculation.

On Sat, 26 Feb 2005 [EMAIL PROTECTED] wrote:
Hi Folks,
While I was browsing in the R-help archives yesterday,
I got curious about the time series of the sizes of
the monthly archives in MB.
This turned out to have an unexpected feature or two,
which I leave to readers to explore for themselves.
I'm now wondering at what point in time we might expect
to be receiving 1000MB/month (30+MB/day). It's not that
far away, it seems, but there are a couple of interesting
modelling questions behind it.
In particular, I wonder by what mechanism the numbers
grow, according to the law which the data seem to indicate.
Over to you.
(just my 0.001 MB worth ... excluding headers)
Ted
To save you the trouble, the following sets up the series:
MB-c(55,19,19,18,19,17,35,27,47,
55,32,50,55,41,49,50,28,53,42,81,54,
99,60,84,80,76,75,78,61,83,97,141,122,
96,144,173,153,226,202,131,165,183,175,168,187,
240,272,262,195,236,244,285,249,326,345,392,268,
455,320,418,453,468,422,447,400,323,516,478,327,
450,487,535,658,573,606,659,543,655,722,677,567,
519,703,886,793,719,816,812,730,698,831,969,736,
855)
April 1997 -- January 2005

E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Feb-05   Time: 10:31:05
-- XFMail --
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R-help Time Series

2005-02-26 Thread Ted Harding
On 26-Feb-05 Prof Brian Ripley wrote:
 I think
 
 1) You have the units wrong: these appear to be the figures
 quoted for KB of compressed files, and the compression is
 nothing like 1024:1.

Sorry, yes, you are correct: it is KB and not MB (a slip of the
eye on my part).

 2) This is not `a series' unless you add a time base, e.g. via
 a call to ts().

Well, in R terms that is strictly correct; but a sequence of
data corresponding to successive regular time points is usually
described as a time series!

 Surely subscribers are aware that they do not get many MB/day
 and that extrapolation to that level is just speculation.

Granted (see above). But anyway, this sort of thing is not
the real point, which is that (regardless of units) this
'sequence' of data has interesting features (which prompted
me to submit my somewhat tongue-in-cheek posting).

The original (quoted below) now suitably amended.

 On Sat, 26 Feb 2005 [EMAIL PROTECTED] wrote:
 
 Hi Folks,

 While I was browsing in the R-help archives yesterday,
 I got curious about the time series of the sizes of
 the monthly archives in KB [was MB].

 This turned out to have an unexpected feature or two,
 which I leave to readers to explore for themselves.

 I'm now wondering at what point in time we might expect
 to be receiving 1000KB/month (30+KB/day) [was MB]. It's
 not that far away, it seems, but there are a couple of
 interesting modelling questions behind it.

 In particular, I wonder by what mechanism the numbers
 grow, according to the law which the data seem to indicate.

 Over to you.

 (just my 0.001 MB worth ... excluding headers)

 Ted

 To save you the trouble, the following sets up the
 sequence [was series; and MB]:

 KB-c(55,19,19,18,19,17,35,27,47,
 55,32,50,55,41,49,50,28,53,42,81,54,
 99,60,84,80,76,75,78,61,83,97,141,122,
 96,144,173,153,226,202,131,165,183,175,168,187,
 240,272,262,195,236,244,285,249,326,345,392,268,
 455,320,418,453,468,422,447,400,323,516,478,327,
 450,487,535,658,573,606,659,543,655,722,677,567,
 519,703,886,793,719,816,812,730,698,831,969,736,
 855)

 April 1997 -- January 2005

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 26-Feb-05   Time: 10:31:05
 -- XFMail --

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Feb-05   Time: 12:06:29
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R-help Time Series

2005-02-26 Thread Gabor Grothendieck
 Ted.Harding at nessie.mcc.ac.uk writes:

: 
: Hi Folks,
: 
: While I was browsing in the R-help archives yesterday,
: I got curious about the time series of the sizes of
: the monthly archives in MB.
: 
: This turned out to have an unexpected feature or two,
: which I leave to readers to explore for themselves.
: 
: I'm now wondering at what point in time we might expect
: to be receiving 1000MB/month (30+MB/day). It's not that
: far away, it seems, but there are a couple of interesting
: modelling questions behind it.
: 
: In particular, I wonder by what mechanism the numbers
: grow, according to the law which the data seem to indicate.
: 
: Over to you.
: 
: (just my 0.001 MB worth ... excluding headers)
: 
: Ted
: 
: To save you the trouble, the following sets up the series:
: 
: MB-c(55,19,19,18,19,17,35,27,47,
: 55,32,50,55,41,49,50,28,53,42,81,54,
: 99,60,84,80,76,75,78,61,83,97,141,122,
: 96,144,173,153,226,202,131,165,183,175,168,187,
: 240,272,262,195,236,244,285,249,326,345,392,268,
: 455,320,418,453,468,422,447,400,323,516,478,327,
: 450,487,535,658,573,606,659,543,655,722,677,567,
: 519,703,886,793,719,816,812,730,698,831,969,736,
: 855)
: 
: April 1997 -- January 2005


There were some discussions on this about a year ago:

http://tolstoy.newcastle.edu.au/R/help/04/04/1071.html

http://tolstoy.newcastle.edu.au/R/help/04/04/1095.html

http://tolstoy.newcastle.edu.au/R/help/04/04/1109.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R-help Time Series

2005-02-26 Thread Uwe Ligges
Prof Brian Ripley wrote:
I think
1) You have the units wrong: these appear to be the figures quoted for 
KB of compressed files, and the compression is nothing like 1024:1.

2) This is not `a series' unless you add a time base, e.g. via a call to 
ts().

Surely subscribers are aware that they do not get many MB/day and that 
extrapolation to that level is just speculation.

So let's be immensely unfair and do some speculation ...
Assuming 1000MB/month means a compressed archive file of (very) 
*roughly* 250MB.

Looking at the data with linear models,
   lm(sqrt(MB) ~ monthindex)
seems not to be the worst model (removing the first observation, perhaps).
So a very *rough* extrapolation shows us that we will get
1000MB/month around 2142.
I hope I'll get a better machine in 137 years to handle all the expected 
traffic. ;-)

Best,
Uwe

On Sat, 26 Feb 2005 [EMAIL PROTECTED] wrote:
Hi Folks,
While I was browsing in the R-help archives yesterday,
I got curious about the time series of the sizes of
the monthly archives in MB.
This turned out to have an unexpected feature or two,
which I leave to readers to explore for themselves.
I'm now wondering at what point in time we might expect
to be receiving 1000MB/month (30+MB/day). It's not that
far away, it seems, but there are a couple of interesting
modelling questions behind it.
In particular, I wonder by what mechanism the numbers
grow, according to the law which the data seem to indicate.
Over to you.
(just my 0.001 MB worth ... excluding headers)
Ted
To save you the trouble, the following sets up the series:
MB-c(55,19,19,18,19,17,35,27,47,
55,32,50,55,41,49,50,28,53,42,81,54,
99,60,84,80,76,75,78,61,83,97,141,122,
96,144,173,153,226,202,131,165,183,175,168,187,
240,272,262,195,236,244,285,249,326,345,392,268,
455,320,418,453,468,422,447,400,323,516,478,327,
450,487,535,658,573,606,659,543,655,722,677,567,
519,703,886,793,719,816,812,730,698,831,969,736,
855)
April 1997 -- January 2005

E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Feb-05   Time: 10:31:05
-- XFMail --
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R-help Time Series

2005-02-26 Thread Ted Harding
On 26-Feb-05 Uwe Ligges wrote:
 So let's be immensely unfair and do some speculation ...
 
 Assuming 1000MB/month means a compressed archive file of (very) 
 *roughly* 250MB.
 
 Looking at the data with linear models,
 lm(sqrt(MB) ~ monthindex)
 seems not to be the worst model (removing the first observation,
 perhaps).

Well, Uwe, disregarding the confusion over units, your model
(and caveat) coincides with mine, leading to

  KB = (2.448731+0.282701*T + noise)^2

where T is in months and the first observation is at T=1.

(Actually, I think the initial burn-in might be a bit longer,
say over 3-4 months, and there is a slight suggestion that the
growth has been slightly flattening out recently).

And what intrigued me is the question: what mechanism might
lead to a quadratic growth law?

One possible interpretation is the following:

Suppose that the number of postings is proportional to the
number of R users. A quadratic has first difference linear in T.
So the average number of additional postings per month can be
seen as a sum of two components:

a) a constant kernel
b) a component proportional to the increment in postings
   in the previous month.

Interpreting this as number of users, it could suggest that
recruitment to R could be due to two causes: recruitment by
a core of fixed size, and recruitment by recent recruits!

Of course this is far from the only possibility. Another might
be that postings to R-help reflect the number of issues that
users are concerned to get help or information on.

This might reflect:

a) a core of die-hard FAQs asked by more and more people;
b) a growing corpus of packages which more and more people
   need guidance with.

And so on. An essential missing piece of data (where I'm
concerned) is the sequence of numbers of subscribers to
R-help.

Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Feb-05   Time: 14:00:46
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html