Is this another discussion of what data might be collected and analyzed, and what could and could not be said if we only had such data? Has anyone but me produced any actual data? If so, I missed it. Hadly mentioned the 'fortunes' package. My earlier methodology, "RSiteSearch('library(fortunes)')", produced 40 hits for 'fortunes', compared to 169 for 'lme4' and 2 for 'DierckxSpline'. With anything like this, it would be wise to approach the problem from many different perspectives, recognizing that the strengths of one approach can help improve our understanding of what other analyses say about the question at hand. Happy Sunday. Spencer Graves
(Ted Harding) wrote:
On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
On 08/03/2009 10:49 AM, hadley wickham wrote:
More seriously : I don't think relative numbers of package downloads
can be interpreted in any reasonable way, because reasons for
package download have a very wide range from curiosity ("what's
this ?"), fun (think "fortunes"...), to vital need tthink lme4
if/when a consensus on denominator DFs can be reached :-)...).
What can you infer in good faith from such a mess ?
So when we have messy data with measurement error, we should just
give up?  Doesn't sound very statistical! ;)
I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question.

You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions.
Duncan Murdoch

The population of R users (which we would need to sample in order
to obtain good data) is probably more elusive than a fish population
in the ocean -- only partially visible at best, and with an unknown
proportion invisible.

At least in Fisheries research, there are long established capture
techniques (from trawling to netting to electro-fishing to ... )
which can be deployed, for research purposes, in such a way as to
potentially reach all members of a target population, with at least
a moderately good approximation to random sampling. What have we
for R?

Come to think of it, electro-fishing, ...

Suppose R were released with 2 types of cookie embedded in base R.
Each type is randomly configured, when R is first run, to be Active
or Inactive (probability of activation to be decided at the design
stage ... ). Type 1, if active, on a certain date generates an
event which brings it to the notice of R-Core (e.g. by clandestine
email or by inducing a bug report). Type 2 acts similarly on a later
date. If Type 2 acts, it carries with it information as to whether
there was a Type 1 action along with whether, apparently, the Type 1
action "succeeded".

We then have, in effect, an analogue of the Mark-Recapture technique
of population estimation (along with the usual questions about
equal catchability and so forth).

However, since this sort of thing (which I am not proposing seriously,
only for the sake of argument) is undoubtedly unethical (and would
do R's reputation no good if it came to light), I tentatively conclude
that the population of R users is likely to remain as elusive as ever.

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09                                       Time: 16:11:44
------------------------------ XFMail ------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to