[R] A comment about R:

Stefan Eichenberger Fri, 06 Jan 2006 06:51:24 -0800

I just got into R for most of the Xmas vacations and was about to ask for 
helping 
pointer on how to get a hold of R when I came across this thread. I've read 
through 
most it and would like to comment from a novice user point of view. I've a 
strong 
programming background but limited statistical experience and no knowledge on 
competing packages. I'm working as a senior engineer in electronics.


Yes, the learning curve is steep. Most of the docu is extremely terse. Learning
is mostly from examples (a wiki was proposed in another mail...), documentation
uses no graphical elements at all. So, when it comes to things like xyplot in
lattice: where would I get the concepts behind panels, superpanels, and the 
like?

ok., this is steep and terse, but after a while I'll get over it... That's life.
The general concept is great, things can be expressed very densly: Potential 
is here.... I quickly had 200 lines of my own code together, doing what it 
should - 
or so I believed.

Next I did:
    matrix<-matrix(1:100, 10, 10)    image(matrix)
    locator()
Great: I can interactively work with my graphs... But then:
    filled.contour(matrix)
    locator()
Oops - wrong coordinates returned. Bug. Apparently, locator() doen't realize
that fitted.contour() has a color bar to the right and scales x wrongly...

Here is what really shocked me:

> str(bar)
`data.frame':   206858 obs. of  12 variables:
 ...
> str(mean(bar[,6:12]))
 Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
 ...
> str(sd(bar[,6:12]))
 Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
 ...
> prcomp(bar[,6:12])->foo
> str(foo$x)
 num [1:206858, 1:7] -0.4187 -0.4015  0.0218 -0.4438 -0.3650 ...
 ...
> str(mean(foo$x))
 num -1.07e-13
> str(sd(foo$x))
 Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
 ...

So, sd returns a vector independent on whether the arguement is a matrix or 
data.frame,
but mean reacts differently and returns a vector only against a data.frame?

The problem here is not that this is difficult to learn - the problem is the 
complete absense
of a concept. Is a data.frame an 'extended' matrix with columns of different 
types or 
something different? Since the numeric mean (I expected a vector) is recycled 
nicely 
when used in a vector context, this makes debugging code close to impossible. 
Since 
sd returns a vector, things like mean + 4*sd vary sufficiently across the data 
elements
that I assume working code... I don't get any warning signal that something is 
wrong here.

The point in case is the behavior of locator() on a filled.contour() object: 
Things apparently 
have been programmed and debugged from example rather than concept.

Now, in another posting I read that all this is a feature to discourge 
inexperienced users
from statistics and force you to think before you do things. Whilst I support 
this concept
of thinking: Did I miss something in statistics? I was in the believe that mean 
and sd were
relatively close to each other conceptually... (here, they are even in 
different packages...)

I will continue using R for the time being. But whether I can recommend it to 
my work 
collegues remains to be seen: How could I ever trust results returned?

I'm still impressed by some of the efficiency, but my trust is deeply shaken...

--------------------------------------------------------------------------------------------------------
Stefan Eichenberger             mailto:[EMAIL PROTECTED]
--------------------------------------------------------------------------------------------------------
        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] A comment about R:

Reply via email to