Re: [R] two questions for R beginners

2010-03-25 Thread Steve Powell
For psychologists like me (possibly for others) by far the most
time-consuming detail is variable labels. I need them for just about
every analysis I do. We can use special packages like Hmisc and its
function spss.get to import the labels, but then nearly all the other
packages don't respect the labels, even simple things like list. So I
find myself either adding them back in at every step or making my own
versions of the functions. People coming from SPSS just expect the
output of basic functions like factanal to display the labels, or at
least to have the option of doing so.
Respecting/preserving variable labels in more core functions would be
an enormous help for social scientists IMHO.

What helped? Lots of things - r-seek and quick-R are my favourites,
along with amazing people who reply to problems on r-help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-25 Thread Jim Lemon

On 03/26/2010 02:58 PM, Steve Powell wrote:

For psychologists like me (possibly for others) by far the most
time-consuming detail is variable labels. I need them for just about
every analysis I do. We can use special packages like Hmisc and its
function spss.get to import the labels, but then nearly all the other
packages don't respect the labels, even simple things like list. So I
find myself either adding them back in at every step or making my own
versions of the functions. People coming from SPSS just expect the
output of basic functions like factanal to display the labels, or at
least to have the option of doing so.
Respecting/preserving variable labels in more core functions would be
an enormous help for social scientists IMHO.


Hi Steve,
From another psychologist, this is one reason that I have been 
rewriting a number of functions to read and display the 
variable.labels attribute produced by the read.spss function in the 
foreign package.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-04 Thread David A.G

To me, as a biologist recycled to biostats, I have always worked with Excel and 
then SPSS and moving to R was difficult (and still is, since I am still 
learning).

Being a self-taught person, I learn R looking for examples in Google, which 
many times takes me to Rwiki or other. I sometimes post questions and most of 
the answers were helpful, but I have found that sometimes the answers have been 
too short or didn´t give enough hints as to how to follow, and that has stopped 
me from asking again in order not to annoy experts. I have not answered too 
many questions from newbies but I have tried to explain as much as I could. 
Sometimes I find it better not to answer rather than just answering a short 
vague answer. Please, examples, examples, examples!

I found most difficult the different data types, since I understand excel as a 
data frame with columns and rows, and that´s it. Then as someone has already 
commented, the class, mode and str functions helped a lot. But I think that to 
me, examples are the way to let people learn. 
From that, I moved to use loops, and am still nervous when people suggest 
ussing *apply functions, I can´t get down to use them!. I find loops more 
logical, and can´t see the way of moving them to *apply.

Finally, I am not a Linux expert , and I cannot get round to install and 
organise a proper R directory and keep updated. I have once tried to use a 
package that needed the development R version and was only prepared for Linux 
R, but couldn´t keep the R-devel versions updated. Some more step-by-step would 
help sometimes.

Thanks for a great tool!




 Date: Tue, 2 Mar 2010 12:44:23 -0600
 From: keo.orms...@gmail.com
 To: landronim...@gmail.com
 CC: r-help@r-project.org; pbu...@pburns.seanet.com
 Subject: Re: [R] two questions for R beginners
 
 Liviu Andronic escribió:
  On Mon, Mar 1, 2010 at 11:49 PM, Liviu Andronic landronim...@gmail.com 
  wrote:

  On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote:
  
   Perhaps my biggest problem was that I couldn't (and still haven't) seen
  *absolute beginners* documents.
 

  there was once a link posted on r-sig-teaching that would probably fit
  your needs, but I cannot find it now.
 
  
 
  OK, I found it. Below is an excerpt of that r-sig-teaching e-mail.
  Liviu
 
  On Thu, Jul 2, 2009 at 2:19 PM, Robert W. Hayden hay...@mv.mv.com wrote:

  I think such a website would be a real asset.  It would be most useful
  if it either were restricted to intro. stats. OR organized so that
  materials for real beginners were easy to extract from all the
  materials for programmers and Ph.D. statisticians.  As a relative
  beginner myself, I find the usual resources useless.  In self defense,
  I created materials for my own beginning students:
 
   http://courses.statistics.com/software/R/Rhome.htm
  
 Hi Liviu,
 This is indeed the best site for introduction I have seen. Although it 
 still assumes some things that at first might seem unintuitive to the 
 absolute beginner I talk about. For instance, in the first page, it 
 shows that you can do sqrt(x), where x can be a vector, and return a 
 vector of the square roots of each number. Although this is high school 
 matrix algebra, most users expect that the input to square root function 
 to be a single number, not a matrix, as in Excel or a calculator. Other 
 concepts that are not explicitly introduced are R workspace, the use 
 of arguments in functions (with or without the =), etc. Others are 
 things like  diff(range(rainfall)) , where you have the output of one 
 function used as the input to another, all in the same command line. All 
 these things seem very basic, but can be difficult if you are trying to 
 learn on your own with no prior experience in programming.
 I hope I am not sounding too difficult and contrarian, I am just trying 
 to share my experience with starting with R, and in trying to convey 
 this learning to my colleagues and students. In the end, I did find 
 everything I needed to learn, and now I feel at ease with R, and I 
 believe that almost anybody that can use Excel or something like it, 
 could learn R.
 
 Thank you for the information,
 Best wishes,
 Keo.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
_
Hotmail: Free, trusted and rich email service.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-04 Thread Kevin Wright
Patrick,

1.  Implicit intercepts.  Implicit intercepts are not too bad for the main
model, but they creep in occasionally in strange places where they might not
be expected.  For example, in some of the variance structures specified in
lme, (~x) automatically expands to (~1+x). Venables said in the Exegeses
paper: For teaching purposes it would be useful to have a switch that
required users to include the intercept term in formulae if it is needed.
This would definitely help more students than it would hinder. In other words
it should be possible to override the automatic intercept term.

2.  Working with colors.  There are a number of functions in R for working
with colors and since colors can be specified by palette number, name,
hexadecimal string, values between 0 and 1, or values between 0 and 256,
things can be confusing.  One problem is that not all functions accept the
same type of arguments or produce the same type of return values.  For
example, the awkward need of t and conversion to [0,255] in adding alpha
levels to a color:
rgb(t(col2rgb(c(navy,maroon))),alpha=120,max=255)

3. Factors. R  tries to convert everything that it possibly can into a
factor.  Except, occasionally, it doesn't try.  Further, after sub-setting
data so that some factor levels have no data, too many functions fail.  I
shouldn't need to use drop.levels from gdata package all over the place to
keep automated scripts running smoothly.  Let's not forget:
R as.numeric(factor(c(NA,0,1)))
[1] NA  1  2

4.

is.list(list(1)[1])
[1] TRUE

is.matrix(matrix(1)[1,])
[1] FALSE

Ouch. Ouch. Ouch.

5. Most useful: apropos and Rseek.

Best,

Kevin


On Thu, Feb 25, 2010 at 11:31 AM, Patrick Burns pbu...@pburns.seanet.comwrote:

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 * What documents helped you the most in this
 initial phase?

 I especially want to hear from people who are
 lazy and impatient.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.

 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Kevin Wright

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-03 Thread Patrick Burns

I think Duncan's example of a list that is
a matrix is a compelling argument not to do
the change.

A matrix that is a list with both names and
dimnames *is* probably rare (but certainly
imaginable).  A matrix that is a list is not
so rare, and the proposed double meaning of
'$' would certainly be confusing in that case.

Pat


On 02/03/2010 17:55, Duncan Murdoch wrote:

On 02/03/2010 11:53 AM, William Dunlap wrote:

 -Original Message-
 From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin
 Sent: Tuesday, March 02, 2010 3:46 AM
 To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch
 Subject: Re: [R] two questions for R beginners
  Please take what follows not as an ad hominem statement, but 
rather as an attempt to improve what is already an excellent 
program, that has been built as a result of many, many hours  of
dedicated work by many, many unpaid, unsung volunteers.
  It troubles me a bit that when a confusing aspect of R is 
pointed out the response is not to try to improve the  language so as
to avoid the confusion, but rather to state  that the confusion is
inherent in the language. I understand  that to make changes that
would avoid the confusing aspect of  the language that has been
discussed in this thread would  take time and effort by an R wizard
(which I am not), time  and effort that would not be compensated in
the traditional  sense. This does not mean that we should not
acknowledge the  confusion. If we what R to be the de facto lingua
franca of  statistical analysis doesn't it make sense to strive for 
syntax that is as straight forward and consistent as possible?
Whenever one changes the language that way old code
will break.

I think in this case not much code would break. Mostly when people have
a matrix M and ask for M$column they'll get an error; the proposal is
that they'll get the requested column. (It is possible to have a list
with names that is also a matrix with dimnames, but I think that is a
pretty unusual construction.) But I haven't been convinced that the
proposal is a net improvement to the language.
Duncan Murdoch


The developers can, with a lot of effort,
fix their own code, and perhaps even user-written code
on CRAN, but code that thousands of users have written
will break. There is a lot of code out there that was
written by trial and error and by folks who no longer
work at an institution: the code works but no one knows
exactly why it works. Telling folks they need to change
that code because we have a cleaner but different syntax
now is not good. Why would one spend time writing a
package that might stop working when R is upgraded?

I think the solution is not to change current semantics
but to write functions that behave better and encourage
users to use them, gradually abandoning the old constructs.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
  Again, please understand that my comment is made with deepest 
respect for the many people who have unselfishly contributed  to the
R project. Many thanks to each and every one of you.
  John
Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 
 On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 
murd...@stats.uwo.ca  wrote:
  Suppose X is a dataframe or a matrix. What would you  expect to
get from   X[1]? What about as.vector(X), or as.numeric(X)?
  All this of course depends on type of object one is speaking  of.
There  are plenty of surprises available, and it's best to use the 
most logical  way of extracting. E.g., to extract the top-left
element of a 2D  structure (data frame or matrix), use 'X[1,1]'.
  Luckily, R provides some shortcuts. For example, you can  write
'X[2,3]'  on a data frame, just as if it was a matrix, even though
the  underlying  structure is completely different. (This doesn't
work on a  normal list;  there you have to type the whole 'X[[2]][3]'.)
  The behaviour of the 'as.' functions may sometimes be surprising,
at  least for me. For example, 'as.data.frame' on a named vector
gives a  single-column data frame, instead of a single-row data frame.
  (I'm not sure what's the recommended way of converting a  named
vector to  row data frame, but 'as.data.frame(t(X))' works, even
though both 'X'  and 't(X)' looks like a row of numbers.)
   The point is that a dataframe is a list, and a matrix  isn't.
If users   don't understand that, then they'll be confused
somewhere. Making   matrices more list-like in one respect will just
move the confusion   elsewhere. The solution is to understand the
difference.
  My main problem is not understanding the difference, which is 
easy, but  knowing which type of I have when I get the output a
function in a  package. If I know the object is a named vector or a
matrix  with column  names, it's easy enough to type
'X[,colname]', and if it's a data  frame one may use the shortcut
'X$colname'.
  Usually, it *is* documented what the return value of a  function
is, but  just looking at the output is much

Re: [R] two questions for R beginners

2010-03-03 Thread Petr PIKAL
John Sorkin jsor...@grecc.umaryland.edu napsal dne 01.03.2010 
15:19:10:

 If it looks like a duck and quacks like a duck, it ought to behave like 
a duck.
 
 To the user a matrix and a dataframe look alike . . . except a dataframe 
can 

Well, matrix looks like a data.frame only on the first sight.

mat-matrix(1:12, 3,4)
dat-as.data.frame(mat)


str(dat)
'data.frame':   3 obs. of  4 variables:
 $ V1: int  1 2 3
 $ V2: int  4 5 6
 $ V3: int  7 8 9
 $ V4: int  10 11 12

str(mat)
 int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...

seems to me a pretty different look like.

Regards
Petr


 hold non-numeric values. Thus to the users, a matrix looks like a 
special case
 of a DF, or perhaps conversely. If you can address elements of one 
structure 
 using a given syntax, you should be able to address elements of the 
other 
 structure using the same syntax. To do otherwise leads to confusion and 
is 
 counter intuitive.
 John
 
 
 
 
 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing) 
Petr 
 PIKAL petr.pi...@precheza.cz 3/1/2010 8:57 AM 
 Hi
 
 r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:
 
  snip
 
   
   I understand that 2 dimensional rectangular matrix looks quite
   similar to data frame however it is only a vector with dimensions.
   As such it can have items of only one type (numeric, character, 
...).
   And you can easily change dimensions of matrix.
   
   matrix-1:12
   dim(matrix) - c(2,6)
   matrix
   dim(matrix) - c(2,2,3)
   matrix
   dim(matrix) -NULL
   matrix
   
   So rectangular structure of printed matrix is a kind of coincidence
   only, whereas rectangular structure of data frame is its main 
feature.
   
   Regards
   Petr
   
   -- 
   Karl Ove Hufthammer
  
  Petr, I think that could be confusing! The way I see it is that
  a matrix is a special case of an array, whose dimension attribute
  is of length 2 (number of rows, number of columns); and row
  and column refer to the rectangular display which you see when
  R prints to matrix. And this, of course, derives directly from
  the historic rectangular view of a matrix when written down.
  
  When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
  you stripped it of its special title of matrix and cast it out
  into the motley mob of arrays (some of whom are matrices, but
  matrix no longer is).
  
  So the rectangular structure of printed matrix is not a coincidence,
  but is its main feature!
 
 Ok. Point taken. However I feel that possibility to manipulate 
 matrix/array dimensions by simple changing them as I  showed above 
 together with perceiving matrix as a **vector with dimensions** 
prevented 
 me especially in early days from using matrices instead of data frames 
and 
 vice versa. 
 
 Consider cbind and rbind confusing results for vectors with unequal 
mode. 
 Far to often we can see something like that
 
  cbind(1:2,letters[1:2])
  [,1] [,2]
 [1,] 1  a 
 [2,] 2  b 
 
 instead of
 
  data.frame(1:2,letters[1:2])
   X1.2 letters.1.2.
 11a
 22b
 
 and then a question why does not the result behave as expected. Each 
type 
 of object has some features which is good for some type of 
 manipulation/analysis/plotting bud quite detrimental for others.
 
 Regards
 Petr
 
 
  
  To come back to Karl's query about why $ works for a dataframe
  but not for a matrix, note that $ is the extractor for getting
  a named component of a list. So, Karl, when you did
  
d=head(iris[1:4])
  
  you created a dataframe:
  
str(d)
# 'data.frame':   6 obs. of  4 variables:
#  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
#  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
#  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
#  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
  
  (with named components Sepal.Length, ... , Petal.Width),
  and a dataframe is a special case of a general list. In a
  general list, the separate components can each be anything.
  In a dataframe, each component is a vector; the different
  vectors may be of different types (logical, numeric, ... )
  but of course the elements of any single vector must be
  of the same type; and, in a dataframe, all the vectors must
  have the same length (otherwise it is a general list, not
  a dataframe).
  
  So, when you print a dataframe, R chooses to display it
  as a rectangular structure. On the other hand, when you
  print a general list, R displays it quite differently:
  
d
#   Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1  5.1 3.5  1.4 0.2
# 2  4.9 3.0  1.4 0.2
# 3  4.7 3.2  1.3 0.2
# 4  4.6 3.1  

Re: [R] two questions for R beginners

2010-03-03 Thread John Sorkin
Petr,
On the other hand . . .

 mat-matrix(1:12, 3,4)
 dat-as.data.frame(mat)
 mat
 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12
 dat
  V1 V2 V3 V4
1  1  4  7 10
2  2  5  8 11
3  3  6  9 12

What you are demonstrating by your example is the manner in which the data are 
organized deep in the guts of R, not the way people, especially R beginners 
visualize objects in their mind. When I think of the integer sixty-nine, I 
visualize 69, not 1000101 despite the fact that 69, as an integer is 
represented in the computer as 1000101.
John







John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) Petr 
PIKAL petr.pi...@precheza.cz 3/3/2010 9:44 AM 
John Sorkin jsor...@grecc.umaryland.edu napsal dne 01.03.2010 
15:19:10:

 If it looks like a duck and quacks like a duck, it ought to behave like 
a duck.
 
 To the user a matrix and a dataframe look alike . . . except a dataframe 
can 

Well, matrix looks like a data.frame only on the first sight.

mat-matrix(1:12, 3,4)
dat-as.data.frame(mat)


str(dat)
'data.frame':   3 obs. of  4 variables:
 $ V1: int  1 2 3
 $ V2: int  4 5 6
 $ V3: int  7 8 9
 $ V4: int  10 11 12

str(mat)
 int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...

seems to me a pretty different look like.

Regards
Petr


 hold non-numeric values. Thus to the users, a matrix looks like a 
special case
 of a DF, or perhaps conversely. If you can address elements of one 
structure 
 using a given syntax, you should be able to address elements of the 
other 
 structure using the same syntax. To do otherwise leads to confusion and 
is 
 counter intuitive.
 John
 
 
 
 
 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing) 
Petr 
 PIKAL petr.pi...@precheza.cz 3/1/2010 8:57 AM 
 Hi
 
 r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:
 
  snip
 
   
   I understand that 2 dimensional rectangular matrix looks quite
   similar to data frame however it is only a vector with dimensions.
   As such it can have items of only one type (numeric, character, 
...).
   And you can easily change dimensions of matrix.
   
   matrix-1:12
   dim(matrix) - c(2,6)
   matrix
   dim(matrix) - c(2,2,3)
   matrix
   dim(matrix) -NULL
   matrix
   
   So rectangular structure of printed matrix is a kind of coincidence
   only, whereas rectangular structure of data frame is its main 
feature.
   
   Regards
   Petr
   
   -- 
   Karl Ove Hufthammer
  
  Petr, I think that could be confusing! The way I see it is that
  a matrix is a special case of an array, whose dimension attribute
  is of length 2 (number of rows, number of columns); and row
  and column refer to the rectangular display which you see when
  R prints to matrix. And this, of course, derives directly from
  the historic rectangular view of a matrix when written down.
  
  When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
  you stripped it of its special title of matrix and cast it out
  into the motley mob of arrays (some of whom are matrices, but
  matrix no longer is).
  
  So the rectangular structure of printed matrix is not a coincidence,
  but is its main feature!
 
 Ok. Point taken. However I feel that possibility to manipulate 
 matrix/array dimensions by simple changing them as I  showed above 
 together with perceiving matrix as a **vector with dimensions** 
prevented 
 me especially in early days from using matrices instead of data frames 
and 
 vice versa. 
 
 Consider cbind and rbind confusing results for vectors with unequal 
mode. 
 Far to often we can see something like that
 
  cbind(1:2,letters[1:2])
  [,1] [,2]
 [1,] 1  a 
 [2,] 2  b 
 
 instead of
 
  data.frame(1:2,letters[1:2])
   X1.2 letters.1.2.
 11a
 22b
 
 and then a question why does not the result behave as expected. Each 
type 
 of object has some features which is good for some type of 
 manipulation/analysis/plotting bud quite detrimental for others.
 
 Regards
 Petr
 
 
  
  To come back to Karl's query about why $ works for a dataframe
  but not for a matrix, note that $ is the extractor for getting
  a named component of a list. So, Karl, when you did
  
d=head(iris[1:4])
  
  you created a dataframe:
  
str(d)
# 'data.frame':   6 obs. of  4 variables:
#  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
#  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
#  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
#  $ Petal.Width : num  0.2 0.2 0.2 0.2 

Re: [R] two questions for R beginners

2010-03-03 Thread Petr PIKAL
Hi

that is why I consider matrix is just a vector with dimensions and 
data.frame is a rectangular structure similar to Excel table. That saved 
me a lot of surprises. 

But I must admit I am not a real beginner nowadays although I still learn 
when using R, reading help list and trying sometimes to help others.

Regards
Petr


John Sorkin jsor...@grecc.umaryland.edu napsal dne 03.03.2010 
16:30:39:

 Petr,
 On the other hand . . .
 
  mat-matrix(1:12, 3,4)
  dat-as.data.frame(mat)
  mat
  [,1] [,2] [,3] [,4]
 [1,]147   10
 [2,]258   11
 [3,]369   12
  dat
   V1 V2 V3 V4
 1  1  4  7 10
 2  2  5  8 11
 3  3  6  9 12
 
 What you are demonstrating by your example is the manner in which the 
data are
 organized deep in the guts of R, not the way people, especially R 
beginners 
 visualize objects in their mind. When I think of the integer sixty-nine, 
I 
 visualize 69, not 1000101 despite the fact that 69, as an integer is 
 represented in the computer as 1000101.
 John
 
 
 
 
 
 
 
 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing) 
Petr 
 PIKAL petr.pi...@precheza.cz 3/3/2010 9:44 AM 
 John Sorkin jsor...@grecc.umaryland.edu napsal dne 01.03.2010 
 15:19:10:
 
  If it looks like a duck and quacks like a duck, it ought to behave 
like 
 a duck.
  
  To the user a matrix and a dataframe look alike . . . except a 
dataframe 
 can 
 
 Well, matrix looks like a data.frame only on the first sight.
 
 mat-matrix(1:12, 3,4)
 dat-as.data.frame(mat)
 
 
 str(dat)
 'data.frame':   3 obs. of  4 variables:
  $ V1: int  1 2 3
  $ V2: int  4 5 6
  $ V3: int  7 8 9
  $ V4: int  10 11 12
 
 str(mat)
  int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
 
 seems to me a pretty different look like.
 
 Regards
 Petr
 
 
  hold non-numeric values. Thus to the users, a matrix looks like a 
 special case
  of a DF, or perhaps conversely. If you can address elements of one 
 structure 
  using a given syntax, you should be able to address elements of the 
 other 
  structure using the same syntax. To do otherwise leads to confusion 
and 
 is 
  counter intuitive.
  John
  
  
  
  
  John David Sorkin M.D., Ph.D.
  Chief, Biostatistics and Informatics
  University of Maryland School of Medicine Division of Gerontology
  Baltimore VA Medical Center
  10 North Greene Street
  GRECC (BT/18/GR)
  Baltimore, MD 21201-1524
  (Phone) 410-605-7119
  (Fax) 410-605-7913 (Please call phone number above prior to faxing) 

 Petr 
  PIKAL petr.pi...@precheza.cz 3/1/2010 8:57 AM 
  Hi
  
  r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:
  
   snip
  

I understand that 2 dimensional rectangular matrix looks quite
similar to data frame however it is only a vector with dimensions.
As such it can have items of only one type (numeric, character, 
 ...).
And you can easily change dimensions of matrix.

matrix-1:12
dim(matrix) - c(2,6)
matrix
dim(matrix) - c(2,2,3)
matrix
dim(matrix) -NULL
matrix

So rectangular structure of printed matrix is a kind of 
coincidence
only, whereas rectangular structure of data frame is its main 
 feature.

Regards
Petr

-- 
Karl Ove Hufthammer
   
   Petr, I think that could be confusing! The way I see it is that
   a matrix is a special case of an array, whose dimension attribute
   is of length 2 (number of rows, number of columns); and row
   and column refer to the rectangular display which you see when
   R prints to matrix. And this, of course, derives directly from
   the historic rectangular view of a matrix when written down.
   
   When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
   you stripped it of its special title of matrix and cast it out
   into the motley mob of arrays (some of whom are matrices, but
   matrix no longer is).
   
   So the rectangular structure of printed matrix is not a 
coincidence,
   but is its main feature!
  
  Ok. Point taken. However I feel that possibility to manipulate 
  matrix/array dimensions by simple changing them as I  showed above 
  together with perceiving matrix as a **vector with dimensions** 
 prevented 
  me especially in early days from using matrices instead of data frames 

 and 
  vice versa. 
  
  Consider cbind and rbind confusing results for vectors with unequal 
 mode. 
  Far to often we can see something like that
  
   cbind(1:2,letters[1:2])
   [,1] [,2]
  [1,] 1  a 
  [2,] 2  b 
  
  instead of
  
   data.frame(1:2,letters[1:2])
X1.2 letters.1.2.
  11a
  22b
  
  and then a question why does not the result behave as expected. Each 
 type 
  of object has some features which is good for some type of 
  

Re: [R] two questions for R beginners

2010-03-03 Thread William Dunlap
If R made
   matrix$columnName
mean the same as
   matrix[, columnName]
(a vector) so matrices looked more like data.frames,
would we also want the following to work
as they do with data.frames?
   with(matrix, log(columnName)) # log of that column as a vector
   matrix[columnName] # 1-column matrix
   matrix[[columnName]] # vector equivalent of that 1-column matrix 
   lm(responseColumn~predictorColumn, data=matrix)
   eval(quote(columnName), envir=matrix)
The last 2 bump into the rule allowing envir to be
a frame number (since a 1x1 matrix is currently taken
as the frame number now).

Perhaps the print methods for data.frame and matrix
should announce the class of the object being printed.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Patrick Burns
 Sent: Wednesday, March 03, 2010 2:44 AM
 To: r-help@r-project.org
 Subject: Re: [R] two questions for R beginners
 
 I think Duncan's example of a list that is
 a matrix is a compelling argument not to do
 the change.
 
 A matrix that is a list with both names and
 dimnames *is* probably rare (but certainly
 imaginable).  A matrix that is a list is not
 so rare, and the proposed double meaning of
 '$' would certainly be confusing in that case.
 
 Pat
 
 
 On 02/03/2010 17:55, Duncan Murdoch wrote:
  On 02/03/2010 11:53 AM, William Dunlap wrote:
   -Original Message-
   From: r-help-boun...@r-project.org 
  [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin
   Sent: Tuesday, March 02, 2010 3:46 AM
   To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch
   Subject: Re: [R] two questions for R beginners
Please take what follows not as an ad hominem statement, but 
  rather as an attempt to improve what is already an excellent 
  program, that has been built as a result of many, many hours  of
  dedicated work by many, many unpaid, unsung volunteers.
It troubles me a bit that when a confusing aspect of R is 
  pointed out the response is not to try to improve the  
 language so as
  to avoid the confusion, but rather to state  that the confusion is
  inherent in the language. I understand  that to make changes that
  would avoid the confusing aspect of  the language that has been
  discussed in this thread would  take time and effort by 
 an R wizard
  (which I am not), time  and effort that would not be 
 compensated in
  the traditional  sense. This does not mean that we should not
  acknowledge the  confusion. If we what R to be the de facto lingua
  franca of  statistical analysis doesn't it make sense to 
 strive for 
  syntax that is as straight forward and consistent as possible?
  Whenever one changes the language that way old code
  will break.
  I think in this case not much code would break. Mostly when 
 people have
  a matrix M and ask for M$column they'll get an error; the 
 proposal is
  that they'll get the requested column. (It is possible to 
 have a list
  with names that is also a matrix with dimnames, but I think 
 that is a
  pretty unusual construction.) But I haven't been convinced that the
  proposal is a net improvement to the language.
  Duncan Murdoch
 
  The developers can, with a lot of effort,
  fix their own code, and perhaps even user-written code
  on CRAN, but code that thousands of users have written
  will break. There is a lot of code out there that was
  written by trial and error and by folks who no longer
  work at an institution: the code works but no one knows
  exactly why it works. Telling folks they need to change
  that code because we have a cleaner but different syntax
  now is not good. Why would one spend time writing a
  package that might stop working when R is upgraded?
 
  I think the solution is not to change current semantics
  but to write functions that behave better and encourage
  users to use them, gradually abandoning the old constructs.
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
Again, please understand that my comment is made with deepest 
  respect for the many people who have unselfishly 
 contributed  to the
  R project. Many thanks to each and every one of you.
John
  Karl Ove Hufthammer k...@huftis.org 3/2/2010 
 4:00 AM 
   On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 
  murd...@stats.uwo.ca  wrote:
Suppose X is a dataframe or a matrix. What would you  
 expect to
  get from   X[1]? What about as.vector(X), or as.numeric(X)?
All this of course depends on type of object one is 
 speaking  of.
  There  are plenty of surprises available, and it's best 
 to use the 
  most logical  way of extracting. E.g., to extract the top-left
  element of a 2D  structure (data frame or matrix), use 'X[1,1]'.
Luckily, R provides some shortcuts. For example, you 
 can  write
  'X[2,3]'  on a data frame, just as if it was a matrix, even though
  the  underlying  structure is completely different. (This doesn't

Re: [R] two questions for R beginners

2010-03-03 Thread John Sorkin
Bill,
The points you make are well taken; one needs to know when to stop. 

I would suggest standardizing the methods used to refer to elements of a matrix 
and a dataframe and going no further. Why do I say this? A beginner, even a 
more experienced R users, probably envisions a dataframe and a matrix has 
having the same structure, but not the same contents. Both appear to be 
multi-dimensional structures that can store data, albeit data of different 
types. A matrix stores numerical values, a dataframe stores data of mixed 
types. This being the case it makes sense to assume that 
A%*%B will work when A and B are matrices, 
but C%*% D will not work when C and D are dataframes. 
This is quite logical and intuitive. It is an extension of the truism that one 
can perform the following arithmetic operation 2*3, but can't perform the 
following operation Bill*John (I use quotes to indicate that the names are 
proper names and not variable names). Despite the observation that on can 
reasonably expect that there are certain operations that one can perform on 
matrices, but not on dataframes (and conversely), the apparent similarity in 
structure of the two objects makes one assume (incorrectly at this time) that 
the syntax used to access elements of an array and a dataframe should be the 
same. I submit that having similar syntax for accessing elements of the two 
structures will assist users learn R. It will not cause them to assume that one 
can perform the exactly the same operations on the two structures.

I apologize to other members of the listserver for the length of this 
subthread. It appears that I have lost the argument, and have not convinced 
those who would need to make the changes to allow matrices and dataframes to 
have similar syntax for addressing elements of the respective structures. I do 
not expect I will be adding any additional comments to this thread, but will 
continue to follow contributions other people make. Perhaps I will learn that I 
am not the only person who feels that the syntax should be consistent, but 
given what I have read so far, I doubt it. I thank everyone who has contributed 
to the discussion.
John







John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) William 
Dunlap wdun...@tibco.com 3/3/2010 1:15 PM 
If R made
   matrix$columnName
mean the same as
   matrix[, columnName]
(a vector) so matrices looked more like data.frames,
would we also want the following to work
as they do with data.frames?
   with(matrix, log(columnName)) # log of that column as a vector
   matrix[columnName] # 1-column matrix
   matrix[[columnName]] # vector equivalent of that 1-column matrix 
   lm(responseColumn~predictorColumn, data=matrix)
   eval(quote(columnName), envir=matrix)
The last 2 bump into the rule allowing envir to be
a frame number (since a 1x1 matrix is currently taken
as the frame number now).

Perhaps the print methods for data.frame and matrix
should announce the class of the object being printed.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Patrick Burns
 Sent: Wednesday, March 03, 2010 2:44 AM
 To: r-help@r-project.org 
 Subject: Re: [R] two questions for R beginners
 
 I think Duncan's example of a list that is
 a matrix is a compelling argument not to do
 the change.
 
 A matrix that is a list with both names and
 dimnames *is* probably rare (but certainly
 imaginable).  A matrix that is a list is not
 so rare, and the proposed double meaning of
 '$' would certainly be confusing in that case.
 
 Pat
 
 
 On 02/03/2010 17:55, Duncan Murdoch wrote:
  On 02/03/2010 11:53 AM, William Dunlap wrote:
   -Original Message-
   From: r-help-boun...@r-project.org 
  [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin
   Sent: Tuesday, March 02, 2010 3:46 AM
   To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch 
   Subject: Re: [R] two questions for R beginners
Please take what follows not as an ad hominem statement, but 
  rather as an attempt to improve what is already an excellent 
  program, that has been built as a result of many, many hours  of
  dedicated work by many, many unpaid, unsung volunteers.
It troubles me a bit that when a confusing aspect of R is 
  pointed out the response is not to try to improve the  
 language so as
  to avoid the confusion, but rather to state  that the confusion is
  inherent in the language. I understand  that to make changes that
  would avoid the confusing aspect of  the language that has been
  discussed in this thread would  take time and effort by 
 an R wizard
  (which I am not), time  and effort that would

Re: [R] two questions for R beginners

2010-03-03 Thread David Winsemius


On Mar 3, 2010, at 12:15 PM, William Dunlap wrote:


If R made
  matrix$columnName
mean the same as
  matrix[, columnName]
(a vector) so matrices looked more like data.frames,
would we also want the following to work
as they do with data.frames?
  with(matrix, log(columnName)) # log of that column as a vector
  matrix[columnName] # 1-column matrix
  matrix[[columnName]] # vector equivalent of that 1-column matrix
  lm(responseColumn~predictorColumn, data=matrix)
  eval(quote(columnName), envir=matrix)
The last 2 bump into the rule allowing envir to be
a frame number (since a 1x1 matrix is currently taken
as the frame number now).

Perhaps the print methods for data.frame and matrix
should announce the class of the object being printed.


Yes! An enthusiastic vote for highlighting this fundamental  
distinction. There is already quite enough conflation of these two  
very dissimilar object classes.


--
David Winsemius


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Patrick Burns
Sent: Wednesday, March 03, 2010 2:44 AM
To: r-help@r-project.org
Subject: Re: [R] two questions for R beginners

I think Duncan's example of a list that is
a matrix is a compelling argument not to do
the change.

A matrix that is a list with both names and
dimnames *is* probably rare (but certainly
imaginable).  A matrix that is a list is not
so rare, and the proposed double meaning of
'$' would certainly be confusing in that case.

Pat


On 02/03/2010 17:55, Duncan Murdoch wrote:

On 02/03/2010 11:53 AM, William Dunlap wrote:

-Original Message-
From: r-help-boun...@r-project.org 

[mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin

Sent: Tuesday, March 02, 2010 3:46 AM
To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch
Subject: Re: [R] two questions for R beginners

Please take what follows not as an ad hominem statement, but 

rather as an attempt to improve what is already an excellent 
program, that has been built as a result of many, many hours  of
dedicated work by many, many unpaid, unsung volunteers.

It troubles me a bit that when a confusing aspect of R is 

pointed out the response is not to try to improve the 

language so as

to avoid the confusion, but rather to state  that the confusion is
inherent in the language. I understand  that to make changes that
would avoid the confusing aspect of  the language that has been
discussed in this thread would  take time and effort by

an R wizard

(which I am not), time  and effort that would not be

compensated in

the traditional  sense. This does not mean that we should not
acknowledge the  confusion. If we what R to be the de facto lingua
franca of  statistical analysis doesn't it make sense to

strive for 

syntax that is as straight forward and consistent as possible?
Whenever one changes the language that way old code
will break.

I think in this case not much code would break. Mostly when

people have

a matrix M and ask for M$column they'll get an error; the

proposal is

that they'll get the requested column. (It is possible to

have a list

with names that is also a matrix with dimnames, but I think

that is a

pretty unusual construction.) But I haven't been convinced that the
proposal is a net improvement to the language.
Duncan Murdoch


The developers can, with a lot of effort,
fix their own code, and perhaps even user-written code
on CRAN, but code that thousands of users have written
will break. There is a lot of code out there that was
written by trial and error and by folks who no longer
work at an institution: the code works but no one knows
exactly why it works. Telling folks they need to change
that code because we have a cleaner but different syntax
now is not good. Why would one spend time writing a
package that might stop working when R is upgraded?

I think the solution is not to change current semantics
but to write functions that behave better and encourage
users to use them, gradually abandoning the old constructs.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

Again, please understand that my comment is made with deepest 

respect for the many people who have unselfishly

contributed  to the

R project. Many thanks to each and every one of you.

John

Karl Ove Hufthammer k...@huftis.org 3/2/2010

4:00 AM 

On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 

murd...@stats.uwo.ca  wrote:

Suppose X is a dataframe or a matrix. What would you 

expect to

get from   X[1]? What about as.vector(X), or as.numeric(X)?

All this of course depends on type of object one is

speaking  of.

There  are plenty of surprises available, and it's best

to use the 

most logical  way of extracting. E.g., to extract the top-left
element of a 2D  structure (data frame or matrix), use 'X[1,1]'.

Luckily, R provides some shortcuts. For example, you

can  write

'X[2,3]'  on a data frame, just as if it was a matrix

Re: [R] two questions for R beginners

2010-03-03 Thread Jim Lemon

On 03/04/2010 08:20 AM, David Winsemius wrote:

...

Perhaps the print methods for data.frame and matrix
should announce the class of the object being printed.


Yes! An enthusiastic vote for highlighting this fundamental distinction.
There is already quite enough conflation of these two very dissimilar
object classes.

If so, please make it an option with an argument like show.class or 
print.fancy that can be set globally in options. Otherwise those of us 
who depend upon the sparse displays of R objects in our functions (e.g. 
in the prettyR package) will suffer the results.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-03 Thread kMan
John,

I felt a short, somewhat strong reply was in order. One of the inherent
aspects of the language is that R demands more of an understanding from
users about what is taking place. Model formulae, for example, are close to
what one would use if they were to write the model on paper. I consider this
a strong feature. The confusing aspects that you point out are not the
result of syntax. Syntax in R is well specified and, I believe, far easier
to work with than many programming languages.

English is a confusing language. C++ is a confusing language. One may have
far more success learning, say, French if he/she does not like the syntax or
grammar of English, or visual Pascal if the syntax of C++ is not preferred,
rather than changing the language. If one wants to do business in a
particular area, then it generally behooves one to suck it up and learn the
native tongue or hire someone for that part. If one wants the program that
is the standard for other world class statistics packages, which also
happens to have a very amendable license agreement, then it behooves one to
suck it up and learn R. 

R is what it is. If someone does not like it, he/she can use something else,
pay far more for an inferior product which will also take longer to do a
calculation and handle less data at once, while risking that the content of
their understanding of statistics is diminished for it. Not that there is
not room for development in R, but the sort of development you demand will
evolve according to similar laws as those that govern economics and/or
change in spoken language.

You'd need major financial backing, and a strong influence over the culture
of those who use R to pull this off. Other than that, you'll have to wait
for the dialect to change over time from the cumulative effect of
contributions from people the world over who all want something different
out of the language.

If someone wants to take on the R challenge for him/herself, however, then
there is likely no better technical support in the world than the R
community, albeit perhaps after dispensing with some of the niceties.

Sincerely,
KeithC.

-Original Message-
From: John Sorkin [mailto:jsor...@grecc.umaryland.edu] 
Sent: Tuesday, March 02, 2010 4:46 AM
To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch
Subject: Re: [R] two questions for R beginners

Please take what follows not as an ad hominem statement, but rather as an
attempt to improve what is already an excellent program, that has been built
as a result of many, many hours of dedicated work by many, many unpaid,
unsung volunteers.

It troubles me a bit that when a confusing aspect of R is pointed out the
response is not to try to improve the language so as to avoid the confusion,
but rather to state that the confusion is inherent in the language. I
understand that to make changes that would avoid the confusing aspect of the
language that has been discussed in this thread would take time and effort
by an R wizard (which I am not), time and effort that would not be
compensated in the traditional sense. This does not mean that we should not
acknowledge the confusion. If we what R to be the de facto lingua franca of
statistical analysis doesn't it make sense to strive for syntax that is as
straight forward and consistent as possible? 

Again, please understand that my comment is made with deepest respect for
the many people who have unselfishly contributed to the R project. Many
thanks to each and every one of you.

John


 Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 
On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca
wrote:
 Suppose X is a dataframe or a matrix.  What would you expect to get 
 from X[1]?  What about as.vector(X), or as.numeric(X)?

All this of course depends on type of object one is speaking of. There are
plenty of surprises available, and it's best to use the most logical way of
extracting. E.g., to extract the top-left element of a 2D structure (data
frame or matrix), use 'X[1,1]'.

Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' 
on a data frame, just as if it was a matrix, even though the underlying
structure is completely different. (This doesn't work on a normal list;
there you have to type the whole 'X[[2]][3]'.)

The behaviour of the 'as.' functions may sometimes be surprising, at least
for me. For example, 'as.data.frame' on a named vector gives a single-column
data frame, instead of a single-row data frame.

(I'm not sure what's the recommended way of converting a named vector to row
data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
and 't(X)' looks like a row of numbers.)

 The point is that a dataframe is a list, and a matrix isn't.  If users 
 don't understand that, then they'll be confused somewhere.  Making 
 matrices more list-like in one respect will just move the confusion 
 elsewhere.  The solution is to understand the difference.

My main problem is not understanding the difference

Re: [R] two questions for R beginners

2010-03-02 Thread Karl Ove Hufthammer
On Tue, 2 Mar 2010 08:58:25 +1300 Peter Alspach 
peter.alsp...@plantandfood.co.nz wrote:
 This brings up another confusion for new users.  Simply typing the
 object name at the command line gives just one view of the object (that
 provided by print()).

Good point. Any good introduction to R should include a brief discussion 
on 'str'. But sometimes even 'str' can fool you from discovering the 
real underlying structure of an object, e.g. for data frames. The 
solution is to use 'unclass' first.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Karl Ove Hufthammer
On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
 Suppose X is a dataframe or a matrix.  What would you expect to get from 
 X[1]?  What about as.vector(X), or as.numeric(X)?

All this of course depends on type of object one is speaking of. There 
are plenty of surprises available, and it's best to use the most logical 
way of extracting. E.g., to extract the top-left element of a 2D 
structure (data frame or matrix), use 'X[1,1]'.

Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' 
on a data frame, just as if it was a matrix, even though the underlying 
structure is completely different. (This doesn't work on a normal list; 
there you have to type the whole 'X[[2]][3]'.)

The behaviour of the 'as.' functions may sometimes be surprising, at 
least for me. For example, 'as.data.frame' on a named vector gives a 
single-column data frame, instead of a single-row data frame.

(I'm not sure what's the recommended way of converting a named vector to 
row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
and 't(X)' looks like a row of numbers.)

 The point is that a dataframe is a list, and a matrix isn't.  If users 
 don't understand that, then they'll be confused somewhere.  Making 
 matrices more list-like in one respect will just move the confusion 
 elsewhere.  The solution is to understand the difference.

My main problem is not understanding the difference, which is easy, but 
knowing which type of I have when I get the output a function in a 
package. If I know the object is a named vector or a matrix with column 
names, it's easy enough to type 'X[,colname]', and if it's a data 
frame one may use the shortcut 'X$colname'.

Usually, it *is* documented what the return value of a function is, but 
just looking at the output is much faster, and *usually* gives the 
correct answer.

For example, 'mean' applied on a data frame gives a named vector, not a 
data frame, which is somewhat surprising (given that the columns of a 
data frame may be of different types, while the elements of a vector may 
not). (And yes, I know that it's *documented* that it returns a named 
vector.) On the other hand, perhaps it is surprising that 'mean' works 
on data frames at all. :-)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Liviu Andronic
On Mon, Mar 1, 2010 at 11:49 PM, Liviu Andronic landronim...@gmail.com wrote:
 On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote:
  Perhaps my biggest problem was that I couldn't (and still haven't) seen
 *absolute beginners* documents.

 there was once a link posted on r-sig-teaching that would probably fit
 your needs, but I cannot find it now.


OK, I found it. Below is an excerpt of that r-sig-teaching e-mail.
Liviu

On Thu, Jul 2, 2009 at 2:19 PM, Robert W. Hayden hay...@mv.mv.com wrote:
 I think such a website would be a real asset.  It would be most useful
 if it either were restricted to intro. stats. OR organized so that
 materials for real beginners were easy to extract from all the
 materials for programmers and Ph.D. statisticians.  As a relative
 beginner myself, I find the usual resources useless.  In self defense,
 I created materials for my own beginning students:

  http://courses.statistics.com/software/R/Rhome.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread John Sorkin
Please take what follows not as an ad hominem statement, but rather as an 
attempt to improve what is already an excellent program, that has been built as 
a result of many, many hours of dedicated work by many, many unpaid, unsung 
volunteers.

It troubles me a bit that when a confusing aspect of R is pointed out the 
response is not to try to improve the language so as to avoid the confusion, 
but rather to state that the confusion is inherent in the language. I 
understand that to make changes that would avoid the confusing aspect of the 
language that has been discussed in this thread would take time and effort by 
an R wizard (which I am not), time and effort that would not be compensated in 
the traditional sense. This does not mean that we should not acknowledge the 
confusion. If we what R to be the de facto lingua franca of statistical 
analysis doesn't it make sense to strive for syntax that is as straight forward 
and consistent as possible? 

Again, please understand that my comment is made with deepest respect for the 
many people who have unselfishly contributed to the R project. Many thanks to 
each and every one of you.

John


 Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 
On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
 Suppose X is a dataframe or a matrix.  What would you expect to get from 
 X[1]?  What about as.vector(X), or as.numeric(X)?

All this of course depends on type of object one is speaking of. There 
are plenty of surprises available, and it's best to use the most logical 
way of extracting. E.g., to extract the top-left element of a 2D 
structure (data frame or matrix), use 'X[1,1]'.

Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' 
on a data frame, just as if it was a matrix, even though the underlying 
structure is completely different. (This doesn't work on a normal list; 
there you have to type the whole 'X[[2]][3]'.)

The behaviour of the 'as.' functions may sometimes be surprising, at 
least for me. For example, 'as.data.frame' on a named vector gives a 
single-column data frame, instead of a single-row data frame.

(I'm not sure what's the recommended way of converting a named vector to 
row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
and 't(X)' looks like a row of numbers.)

 The point is that a dataframe is a list, and a matrix isn't.  If users 
 don't understand that, then they'll be confused somewhere.  Making 
 matrices more list-like in one respect will just move the confusion 
 elsewhere.  The solution is to understand the difference.

My main problem is not understanding the difference, which is easy, but 
knowing which type of I have when I get the output a function in a 
package. If I know the object is a named vector or a matrix with column 
names, it's easy enough to type 'X[,colname]', and if it's a data 
frame one may use the shortcut 'X$colname'.

Usually, it *is* documented what the return value of a function is, but 
just looking at the output is much faster, and *usually* gives the 
correct answer.

For example, 'mean' applied on a data frame gives a named vector, not a 
data frame, which is somewhat surprising (given that the columns of a 
data frame may be of different types, while the elements of a vector may 
not). (And yes, I know that it's *documented* that it returns a named 
vector.) On the other hand, perhaps it is surprising that 'mean' works 
on data frames at all. :-)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Duncan Murdoch

John Sorkin wrote:

Please take what follows not as an ad hominem statement, but rather as an 
attempt to improve what is already an excellent program, that has been built as 
a result of many, many hours of dedicated work by many, many unpaid, unsung 
volunteers.

It troubles me a bit that when a confusing aspect of R is pointed out the response is not to try to improve the language so as to avoid the confusion, but rather to state that the confusion is inherent in the language. I understand that to make changes that would avoid the confusing aspect of the language that has been discussed in this thread would take time and effort by an R wizard (which I am not), time and effort that would not be compensated in the traditional sense. This does not mean that we should not acknowledge the confusion. If we what R to be the de facto lingua franca of statistical analysis doesn't it make sense to strive for syntax that is as straight forward and consistent as possible? 
  


I think you've misunderstood the argument.  It would not be hard to make 
the suggested change.  I don't object to it because it would be too much 
work, I object to it because I think it is not an improvement.  
Dataframes and matrices are different, and there is no way to avoid that 
fact. 


The arguments in favour of the change seem to be these:

- Dataframes and matrices are similar in some respects, so they should 
be similar in more.


In fact, I believe that the source of confusion is the fact that  the 
are similar, so this would not improve things.  People would still be 
confused by the differences, which are unavoidable.


- Using $ to extract a column of a matrix would be convenient.

I agree, it saves 4 keystrokes to type X$column instead of 
X[,column].  But I think it increases confusion, so the savings are 
not worthwhile.  For example, the col2rgb function returns a matrix with 
rows named red, green and blue.  But under your proposal, I'd still need 
to use X[red,] to extract the red component, because columns are 
components, but rows are not.   You are complaining that the lack of $ 
for matrices is an unnecessary asymmetry, and unnecessary asymmetries 
are confusing.  But your proposal introduces a new one!


 - Some functions return matrices when I expect a dataframe, or vice versa.

That will continue to be true regardless of whether the proposed change 
is made.  You need to read the documentation.  If it is unclear, it 
should be improved, the language shouldn't be changed so that sloppy 
documentation is accurate.


 - You suggested this so anyone who disagrees must be lazy.

Which really is an ad hominem argument, despite your disclaimer.  I 
think you should respect the fact that there are people who disagree 
with the value of your suggestion.   (Which is also an ad hominem 
attack, but isn't central to my argument.)


Duncan Murdoch


Again, please understand that my comment is made with deepest respect for the 
many people who have unselfishly contributed to the R project. Many thanks to 
each and every one of you.

John


  

Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 

On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
  
Suppose X is a dataframe or a matrix.  What would you expect to get from 
X[1]?  What about as.vector(X), or as.numeric(X)?



All this of course depends on type of object one is speaking of. There 
are plenty of surprises available, and it's best to use the most logical 
way of extracting. E.g., to extract the top-left element of a 2D 
structure (data frame or matrix), use 'X[1,1]'.


Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' 
on a data frame, just as if it was a matrix, even though the underlying 
structure is completely different. (This doesn't work on a normal list; 
there you have to type the whole 'X[[2]][3]'.)


The behaviour of the 'as.' functions may sometimes be surprising, at 
least for me. For example, 'as.data.frame' on a named vector gives a 
single-column data frame, instead of a single-row data frame.


(I'm not sure what's the recommended way of converting a named vector to 
row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
and 't(X)' looks like a row of numbers.)


  
The point is that a dataframe is a list, and a matrix isn't.  If users 
don't understand that, then they'll be confused somewhere.  Making 
matrices more list-like in one respect will just move the confusion 
elsewhere.  The solution is to understand the difference.



My main problem is not understanding the difference, which is easy, but 
knowing which type of I have when I get the output a function in a 
package. If I know the object is a named vector or a matrix with column 
names, it's easy enough to type 'X[,colname]', and if it's a data 
frame one may use the shortcut 'X$colname'.


Usually, it *is* documented what the return value of a function is, but 
just looking at the output is much faster, 

Re: [R] two questions for R beginners

2010-03-02 Thread Gabor Grothendieck
On Tue, Mar 2, 2010 at 7:27 AM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 John Sorkin wrote:

 Please take what follows not as an ad hominem statement, but rather as an
 attempt to improve what is already an excellent program, that has been built
 as a result of many, many hours of dedicated work by many, many unpaid,
 unsung volunteers.

 It troubles me a bit that when a confusing aspect of R is pointed out the
 response is not to try to improve the language so as to avoid the confusion,
 but rather to state that the confusion is inherent in the language. I
 understand that to make changes that would avoid the confusing aspect of the
 language that has been discussed in this thread would take time and effort
 by an R wizard (which I am not), time and effort that would not be
 compensated in the traditional sense. This does not mean that we should not
 acknowledge the confusion. If we what R to be the de facto lingua franca of
 statistical analysis doesn't it make sense to strive for syntax that is as
 straight forward and consistent as possible?

 I think you've misunderstood the argument.  It would not be hard to make the
 suggested change.  I don't object to it because it would be too much work, I
 object to it because I think it is not an improvement.  Dataframes and
 matrices are different, and there is no way to avoid that fact.
 The arguments in favour of the change seem to be these:

Users of zoo have some experience with this since zoo uses matrices to
represent 2d time series and originally did not support $ as a column
extractor but now does.  I was originally opposed to adding it for the
reasons you state but it was eventually added and having used it for
some time now since it got into the package I must say that it is very
convenient and I now regard it as a definite improvement in user
experience.  Certainly I use the feature all the time.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Duncan Murdoch

On 02/03/2010 11:53 AM, William Dunlap wrote:

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin

 Sent: Tuesday, March 02, 2010 3:46 AM
 To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch
 Subject: Re: [R] two questions for R beginners
 
 Please take what follows not as an ad hominem statement, but 
 rather as an attempt to improve what is already an excellent 
 program, that has been built as a result of many, many hours 
 of dedicated work by many, many unpaid, unsung volunteers.
 
 It troubles me a bit that when a confusing aspect of R is 
 pointed out the response is not to try to improve the 
 language so as to avoid the confusion, but rather to state 
 that the confusion is inherent in the language. I understand 
 that to make changes that would avoid the confusing aspect of 
 the language that has been discussed in this thread would 
 take time and effort by an R wizard (which I am not), time 
 and effort that would not be compensated in the traditional 
 sense. This does not mean that we should not acknowledge the 
 confusion. If we what R to be the de facto lingua franca of 
 statistical analysis doesn't it make sense to strive for 
 syntax that is as straight forward and consistent as possible? 


Whenever one changes the language that way old code
will break. 
I think in this case not much code would break.  Mostly when people have 
a matrix M and ask for M$column they'll get an error; the proposal is 
that they'll get the requested column.  (It is possible to have a list 
with names that is also a matrix with dimnames, but I think that is a 
pretty unusual construction.)  But I haven't been convinced that the 
proposal is a net improvement to the language. 


Duncan Murdoch


 The developers can, with a lot of effort,
fix their own code, and perhaps even user-written code
on CRAN, but code that thousands of users have written
will break.  There is a lot of code out there that was
written by trial and error and by folks who no longer
work at an institution: the code works but no one knows
exactly why it works.  Telling folks they need to change
that code because we have a cleaner but different syntax
now is not good.  Why would one spend time writing a
package that might stop working when R is upgraded?

I think the solution is not to change current semantics
but to write functions that behave better and encourage
users to use them, gradually abandoning the old constructs.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 Again, please understand that my comment is made with deepest 
 respect for the many people who have unselfishly contributed 
 to the R project. Many thanks to each and every one of you.
 
 John
 
 
  Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 
 On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 
 murd...@stats.uwo.ca 
 wrote:
  Suppose X is a dataframe or a matrix.  What would you 
 expect to get from 
  X[1]?  What about as.vector(X), or as.numeric(X)?
 
 All this of course depends on type of object one is speaking 
 of. There 
 are plenty of surprises available, and it's best to use the 
 most logical 
 way of extracting. E.g., to extract the top-left element of a 2D 
 structure (data frame or matrix), use 'X[1,1]'.
 
 Luckily, R provides some shortcuts. For example, you can 
 write 'X[2,3]' 
 on a data frame, just as if it was a matrix, even though the 
 underlying 
 structure is completely different. (This doesn't work on a 
 normal list; 
 there you have to type the whole 'X[[2]][3]'.)
 
 The behaviour of the 'as.' functions may sometimes be surprising, at 
 least for me. For example, 'as.data.frame' on a named vector gives a 
 single-column data frame, instead of a single-row data frame.
 
 (I'm not sure what's the recommended way of converting a 
 named vector to 
 row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
 and 't(X)' looks like a row of numbers.)
 
  The point is that a dataframe is a list, and a matrix 
 isn't.  If users 
  don't understand that, then they'll be confused somewhere.  Making 
  matrices more list-like in one respect will just move the confusion 
  elsewhere.  The solution is to understand the difference.
 
 My main problem is not understanding the difference, which is 
 easy, but 
 knowing which type of I have when I get the output a function in a 
 package. If I know the object is a named vector or a matrix 
 with column 
 names, it's easy enough to type 'X[,colname]', and if it's a data 
 frame one may use the shortcut 'X$colname'.
 
 Usually, it *is* documented what the return value of a 
 function is, but 
 just looking at the output is much faster, and *usually* gives the 
 correct answer.
 
 For example, 'mean' applied on a data frame gives a named 
 vector, not a 
 data frame, which is somewhat surprising (given that the columns of a 
 data frame may be of different types, while the elements of a 
 vector may

Re: [R] two questions for R beginners

2010-03-02 Thread John Sorkin
William,
I agree that changing syntax can lead to problems. I don't, however think 
extending the language will break existing code. Providing a common syntax for 
accessing matrices and dataframes will not change the way things have been done 
to date, but rather how things will be done in the future.
John  
John Sorkin
jsor...@grecc.umaryland.edu 
-Original Message-
From: William Dunlap wdun...@tibco.com
To: John Sorkin jsor...@grecc.umaryland.edu
To: Karl Ove Hufthammer k...@huftis.org
To:  r-h...@stat.math.ethz.ch

Sent: 3/2/2010 11:53:45 AM
Subject: RE: [R] two questions for R beginners

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin
 Sent: Tuesday, March 02, 2010 3:46 AM
 To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch
 Subject: Re: [R] two questions for R beginners
 
 Please take what follows not as an ad hominem statement, but 
 rather as an attempt to improve what is already an excellent 
 program, that has been built as a result of many, many hours 
 of dedicated work by many, many unpaid, unsung volunteers.
 
 It troubles me a bit that when a confusing aspect of R is 
 pointed out the response is not to try to improve the 
 language so as to avoid the confusion, but rather to state 
 that the confusion is inherent in the language. I understand 
 that to make changes that would avoid the confusing aspect of 
 the language that has been discussed in this thread would 
 take time and effort by an R wizard (which I am not), time 
 and effort that would not be compensated in the traditional 
 sense. This does not mean that we should not acknowledge the 
 confusion. If we what R to be the de facto lingua franca of 
 statistical analysis doesn't it make sense to strive for 
 syntax that is as straight forward and consistent as possible? 

Whenever one changes the language that way old code
will break.  The developers can, with a lot of effort,
fix their own code, and perhaps even user-written code
on CRAN, but code that thousands of users have written
will break.  There is a lot of code out there that was
written by trial and error and by folks who no longer
work at an institution: the code works but no one knows
exactly why it works.  Telling folks they need to change
that code because we have a cleaner but different syntax
now is not good.  Why would one spend time writing a
package that might stop working when R is upgraded?

I think the solution is not to change current semantics
but to write functions that behave better and encourage
users to use them, gradually abandoning the old constructs.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 Again, please understand that my comment is made with deepest 
 respect for the many people who have unselfishly contributed 
 to the R project. Many thanks to each and every one of you.
 
 John
 
 
  Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 
 On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 
 murd...@stats.uwo.ca 
 wrote:
  Suppose X is a dataframe or a matrix.  What would you 
 expect to get from 
  X[1]?  What about as.vector(X), or as.numeric(X)?
 
 All this of course depends on type of object one is speaking 
 of. There 
 are plenty of surprises available, and it's best to use the 
 most logical 
 way of extracting. E.g., to extract the top-left element of a 2D 
 structure (data frame or matrix), use 'X[1,1]'.
 
 Luckily, R provides some shortcuts. For example, you can 
 write 'X[2,3]' 
 on a data frame, just as if it was a matrix, even though the 
 underlying 
 structure is completely different. (This doesn't work on a 
 normal list; 
 there you have to type the whole 'X[[2]][3]'.)
 
 The behaviour of the 'as.' functions may sometimes be surprising, at 
 least for me. For example, 'as.data.frame' on a named vector gives a 
 single-column data frame, instead of a single-row data frame.
 
 (I'm not sure what's the recommended way of converting a 
 named vector to 
 row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
 and 't(X)' looks like a row of numbers.)
 
  The point is that a dataframe is a list, and a matrix 
 isn't.  If users 
  don't understand that, then they'll be confused somewhere.  Making 
  matrices more list-like in one respect will just move the confusion 
  elsewhere.  The solution is to understand the difference.
 
 My main problem is not understanding the difference, which is 
 easy, but 
 knowing which type of I have when I get the output a function in a 
 package. If I know the object is a named vector or a matrix 
 with column 
 names, it's easy enough to type 'X[,colname]', and if it's a data 
 frame one may use the shortcut 'X$colname'.
 
 Usually, it *is* documented what the return value of a 
 function is, but 
 just looking at the output is much faster, and *usually* gives the 
 correct answer.
 
 For example, 'mean' applied on a data frame gives a named 
 vector, not a 
 data frame, which

Re: [R] two questions for R beginners

2010-03-02 Thread Keo Ormsby

Liviu Andronic escribió:

On Mon, Mar 1, 2010 at 11:49 PM, Liviu Andronic landronim...@gmail.com wrote:
  

On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote:


 Perhaps my biggest problem was that I couldn't (and still haven't) seen
*absolute beginners* documents.

  

there was once a link posted on r-sig-teaching that would probably fit
your needs, but I cannot find it now.




OK, I found it. Below is an excerpt of that r-sig-teaching e-mail.
Liviu

On Thu, Jul 2, 2009 at 2:19 PM, Robert W. Hayden hay...@mv.mv.com wrote:
  

I think such a website would be a real asset.  It would be most useful
if it either were restricted to intro. stats. OR organized so that
materials for real beginners were easy to extract from all the
materials for programmers and Ph.D. statisticians.  As a relative
beginner myself, I find the usual resources useless.  In self defense,
I created materials for my own beginning students:

 http://courses.statistics.com/software/R/Rhome.htm


Hi Liviu,
This is indeed the best site for introduction I have seen. Although it 
still assumes some things that at first might seem unintuitive to the 
absolute beginner I talk about. For instance, in the first page, it 
shows that you can do sqrt(x), where x can be a vector, and return a 
vector of the square roots of each number. Although this is high school 
matrix algebra, most users expect that the input to square root function 
to be a single number, not a matrix, as in Excel or a calculator. Other 
concepts that are not explicitly introduced are R workspace, the use 
of arguments in functions (with or without the =), etc. Others are 
things like  diff(range(rainfall)) , where you have the output of one 
function used as the input to another, all in the same command line. All 
these things seem very basic, but can be difficult if you are trying to 
learn on your own with no prior experience in programming.
I hope I am not sounding too difficult and contrarian, I am just trying 
to share my experience with starting with R, and in trying to convey 
this learning to my colleagues and students. In the end, I did find 
everything I needed to learn, and now I feel at ease with R, and I 
believe that almost anybody that can use Excel or something like it, 
could learn R.


Thank you for the information,
Best wishes,
Keo.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Karl Ove Hufthammer
On Thu, 25 Feb 2010 17:31:19 + Patrick Burns 
pbu...@pburns.seanet.com wrote:
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

I didn't have any major stumbling blocks, but even after years of using 
R I didn't have a clear concept of what exactly a vector, a list and a 
data frame was, and what was the difference and similarities between 
them (and stuff like why does x[i] return a different result than 
x[[i]]).

Some things that have tripped my up is reassigning the value of T or F 
and getting very strange results afterwards (I now use only TRUE and 
FALSE). FAQ 7.31 and 7.22 have also been troublesome at times, 
especially 7.31 when used in 'for' loops.

Also I found it quite confusing that

?ifelse

works, but not

?if

(you have to type ?if)

Also, why ?plot didn't give me the information I was looking for but
?plot.default did was rather confusing. I still experience similar 
problems with other functions. Usually 'methods' help, but some packages 
use S4 methods, which makes finding the correct help package quite 
challenging at times.

 * What documents helped you the most in this
 initial phase?

In the initial phase I found the Rtips 
http://pj.freefaculty.org/R/Rtips.html;
extremely useful.

For understanding the difference between the various data types in R, 
Phil Spector's wonderful book 'Data Manipulation with R' was a great 
help. When reading it I finally understood things I have been wondering 
about for years. It really like the book. It's short, crystal clear and 
immensely useful.

Another very useful document of a more advanced nature is the R Inferno. 
Best read after you've been using R for some time, though.

I'm over the initial phase now, but two resources which continue to be 
of great help is http://www.rseek.org/ (mainly for searching the mailing 
list) and the 'sos' package (for finding the functions and packages I 
need). 'sos' really is great. There have been other packages/functions 
trying to do the same thing, but they have been to time-consuming and 
difficult to use (and learn), typically requiring you to first do a 
search, and then do some advanced subsetting to get useful results. This 
is similar to older search engines requiring many boolean terms to give 
the needed search results. With 'sos' I just choose some simple search 
terms describing what I'm looking for, and immediately get relevant 
results. 'sos' really is the Google of the R world. It has made a great 
impact on the discoverability of the various R functions and packages.

Lastly, the 'demo' function is seldom mentioned, and easy to overlook, 
but gives a nice (and sometimes impressive) overview of what type of 
graphics is possible to create with a given packages. I wish more 
packages would have well-written demos. Also, I think some of the 
examples from the 'example' sections of help pages for functions could 
very well be copied to the demo of the corresponding package, e.g. a few 
of the examples of the 'xyplot' function in 'lattice'.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Karl Ove Hufthammer
On Fri, 26 Feb 2010 11:56:10 -0800 (PST) Jack Siegrist 
jack...@eden.rutgers.edu wrote:
 What I think would be very helpful is an introduction to programming using
 R

Here you are:
A First Course in Statistical Programming with R
http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521694247

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Karl Ove Hufthammer
On Mon, 1 Mar 2010 11:02:59 +0100 Karl Ove Hufthammer k...@huftis.org 
wrote:
  * What were your biggest misconceptions or
  stumbling blocks to getting up and running
  with R?
 
 Also I found it quite confusing that

One more thing that still trips me up sometimes. '$' works on data 
frames but not on matrices (with dimnames/colnames). Even though the two 
objects *look* exactly the same, '$' on one of them works while '$' on 
the other gives a *very* confusing error message. Example:

  d=head(iris[1:4])
  d2=as.matrix(d)
  
  d
  d2
  
  d$Sepal.Width
  d2$Sepal.Width

Some functions output matrices where you would expect them to output 
data frames, and then this problem occurs. (Is there a reason why '$' 
could/should not be made to 'work' on matrices too?)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 01.03.2010 11:26:40:

 On Mon, 1 Mar 2010 11:02:59 +0100 Karl Ove Hufthammer k...@huftis.org 
 wrote:
   * What were your biggest misconceptions or
   stumbling blocks to getting up and running
   with R?
  
  Also I found it quite confusing that
 
 One more thing that still trips me up sometimes. '$' works on data 
 frames but not on matrices (with dimnames/colnames). Even though the two 

 objects *look* exactly the same, '$' on one of them works while '$' on 
 the other gives a *very* confusing error message. Example:
 
   d=head(iris[1:4])
   d2=as.matrix(d)
 
   d
   d2
 
   d$Sepal.Width
   d2$Sepal.Width
 
 Some functions output matrices where you would expect them to output 
 data frames, and then this problem occurs. (Is there a reason why '$' 
 could/should not be made to 'work' on matrices too?)

I understand that 2 dimensional rectangular matrix looks quite similar to 
data frame however it is only a vector with dimensions. As such it can 
have items of only one type (numeric, character, ...). And you can easily 
change dimensions of matrix.

matrix-1:12
dim(matrix) - c(2,6)
matrix
dim(matrix) - c(2,2,3)
matrix
dim(matrix) -NULL
matrix

So rectangular structure of printed matrix is a kind of coincidence only, 
whereas rectangular structure of data frame is its main feature.

Regards
Petr


 
 -- 
 Karl Ove Hufthammer
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Duncan Murdoch

Karl Ove Hufthammer wrote:
On Fri, 26 Feb 2010 11:56:10 -0800 (PST) Jack Siegrist 
jack...@eden.rutgers.edu wrote:
  

What I think would be very helpful is an introduction to programming using
R



Here you are:
A First Course in Statistical Programming with R
http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521694247

  
Jack also asked for it to be a big thick college textbook that takes at 
least a semester to go through, which should be a prerequisite for going 
through the Introduction to R available on CRAN.  That book (of which I 
am an author) is not big or thick.  But it is aimed at an audience who 
don't have programming experience.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Duncan Murdoch

Karl Ove Hufthammer wrote:
On Mon, 1 Mar 2010 11:02:59 +0100 Karl Ove Hufthammer k...@huftis.org 
wrote:
  

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?
  

Also I found it quite confusing that



One more thing that still trips me up sometimes. '$' works on data 
frames but not on matrices (with dimnames/colnames). Even though the two 
objects *look* exactly the same, '$' on one of them works while '$' on 
the other gives a *very* confusing error message. Example:


  d=head(iris[1:4])
  d2=as.matrix(d)
  
  d

  d2
  
  d$Sepal.Width

  d2$Sepal.Width

Some functions output matrices where you would expect them to output 
data frames, and then this problem occurs. (Is there a reason why '$' 
could/should not be made to 'work' on matrices too?)


  
The reason for the difference is that data.frames are lists organized 
into columns (so the $ handling comes from the list, where it means 
extract the component) whereas a matrix is a single vector displayed 
in columns. 

Of course, the problem is that a beginner only knows that they both look 
the same.  But I think the idea of a list is so fundamental to R that it 
needs to be something learned pretty early, so I'd rather not blur the 
distinction between dataframes and matrices.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Ted Harding
On 01-Mar-10 11:09:51, Petr PIKAL wrote:
 Hi
 r-help-boun...@r-project.org napsal dne 01.03.2010 11:26:40:
 On Mon, 1 Mar 2010 11:02:59 +0100 Karl Ove Hufthammer
 k...@huftis.org 
 wrote:
   * What were your biggest misconceptions or
   stumbling blocks to getting up and running
   with R?
  
  Also I found it quite confusing that
 
 One more thing that still trips me up sometimes. '$' works
 on data frames but not on matrices (with dimnames/colnames).
 Even though the two objects *look* exactly the same, '$' on
 one of them works while '$' on the other gives a *very*
 confusing error message. Example:
 
   d=head(iris[1:4])
   d2=as.matrix(d)
 
   d
   d2
 
   d$Sepal.Width
   d2$Sepal.Width
 
 Some functions output matrices where you would expect them to
 output data frames, and then this problem occurs. (Is there a
 reason why '$' could/should not be made to 'work' on matrices too?)
 
 I understand that 2 dimensional rectangular matrix looks quite
 similar to data frame however it is only a vector with dimensions.
 As such it can have items of only one type (numeric, character, ...).
 And you can easily change dimensions of matrix.
 
 matrix-1:12
 dim(matrix) - c(2,6)
 matrix
 dim(matrix) - c(2,2,3)
 matrix
 dim(matrix) -NULL
 matrix
 
 So rectangular structure of printed matrix is a kind of coincidence
 only, whereas rectangular structure of data frame is its main feature.
 
 Regards
 Petr
 
 -- 
 Karl Ove Hufthammer

Petr, I think that could be confusing! The way I see it is that
a matrix is a special case of an array, whose dimension attribute
is of length 2 (number of rows, number of columns); and row
and column refer to the rectangular display which you see when
R prints to matrix. And this, of course, derives directly from
the historic rectangular view of a matrix when written down.

When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
you stripped it of its special title of matrix and cast it out
into the motley mob of arrays (some of whom are matrices, but
matrix no longer is).

So the rectangular structure of printed matrix is not a coincidence,
but is its main feature!

To come back to Karl's query about why $ works for a dataframe
but not for a matrix, note that $ is the extractor for getting
a named component of a list. So, Karl, when you did

  d=head(iris[1:4])

you created a dataframe:

  str(d)
  # 'data.frame':   6 obs. of  4 variables:
  #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
  #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
  #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
  #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4

(with named components Sepal.Length, ... , Petal.Width),
and a dataframe is a special case of a general list. In a
general list, the separate components can each be anything.
In a dataframe, each component is a vector; the different
vectors may be of different types (logical, numeric, ... )
but of course the elements of any single vector must be
of the same type; and, in a dataframe, all the vectors must
have the same length (otherwise it is a general list, not
a dataframe).

So, when you print a dataframe, R chooses to display it
as a rectangular structure. On the other hand, when you
print a general list, R displays it quite differently:

  d
  #   Sepal.Length Sepal.Width Petal.Length Petal.Width
  # 1  5.1 3.5  1.4 0.2
  # 2  4.9 3.0  1.4 0.2
  # 3  4.7 3.2  1.3 0.2
  # 4  4.6 3.1  1.5 0.2
  # 5  5.0 3.6  1.4 0.2
  # 6  5.4 3.9  1.7 0.4

  d3 - list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
  d3
  # $C1
  # [1] 1.1 1.2 1.3
  # $C2
  # [1] 2.1 2.2 2.3 2.4

Notice the similarity (though not identity) between the print
of d3 and the output of str(d). There is a bit more hard-wired
stuff built into a dataframe which makes it more than simply
a list with all components vectors of equal length). However,
one could also say that the rectangular structure is its
main feature.

As to why $ will not work on matrices: a matrix, as Petr
points out, is a vector with a dimensions attribute which
has length 2 (as opposed to a general array where the length
of the dimensions attribute could be anything). Hence it is
not a list of named components in the sense of list.

Hence $ will not work with a matrix, since $ will not
be able to find any list-components. which is basically what
the error message

  d2$Sepal.Width
  # Error in d2$Sepal.Width : $ operator is invalid for atomic vectors

is telling you: d2 is an atomic vector with a length-2 dimensions
attribute. It has no list-type components for $ to get its
hands on.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Mar-10   Time: 12:03:21

Re: [R] two questions for R beginners

2010-03-01 Thread Karl Ove Hufthammer
On Mon, 01 Mar 2010 06:37:30 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
  Some functions output matrices where you would expect them to output 
  data frames, and then this problem occurs. (Is there a reason why '$' 
  could/should not be made to 'work' on matrices too?)

 The reason for the difference is that data.frames are lists organized 
 into columns (so the $ handling comes from the list, where it means 
 extract the component) whereas a matrix is a single vector displayed 
 in columns. 

Sure, I know that. But is there are reason why the '$' can't be 
overloaded to handle the extraction, as a *convenience* to the user? 
After all, it *is* possible to extract columns by name from matrices
too (e.g., using d[,Sepal.Width]).

A similar type of overloading is used in the 'sp' class functions,
where you can basically treat a 'SpatialPointsDataFrame', a 
'SpatialLinesDataFrame' or a 'SpatialPolygonsDataFrame' as a data frame, 
with '$colname' indexing and '[' subsetting, even though the *internals* 
of the objects have a completely different (and very complex) structure. 
But as a convenience to the user, you don't need to bother with the 
internals, and can handle the object *as if* it were a data frame. It's 
a very comfortable way of working.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Paul Hiemstra

Jack Siegrist wrote:

My biggest impediment, as a scientist without previous programming
experience, is that the R help is not beginner-friendly. I think it is
probably great for experienced programmers and for the people who helped to
create the software, to help them  remember what they did, but I think it is
very difficult for a newcomer without a strong programming background to
learn about a new function or to discover the name of a function that you
are pretty sure should already exist. Maybe this wouldn’t matter for most
programming languages, but as free statistics software R is obviously going
to attract many scientists who want to get an analysis done and have varying
levels of experience with programming. 
  

Hi Jack,

A problem more or less is that the R community consists primarily of 
volunteers. People who answer questions to the help list in their spare 
time or during company time. This also holds for many of the online 
material. A program like Mathematica has a company providing the online 
material, they hire people to do this work. I don't use this as an 
excuse for R, but it might explain why the R community is what it is.


In reply to the 'bashing' of new users. I agree that sometimes the 
experts answering the questions can be blunt, but most often it is in 
response to questions that are very hard to answer. As I said earlier in 
this mail thread, asking the right question already involves some of the 
knowledge to answer the question. So to get good, informative responses 
a user needs some level already.


I do want to point out that there is a posting guide for the mailing 
list that gives a quite detailed instructions, like give the exact error 
(don't just say, R crashes). Provide traceback() and sessionInfo() etc, 
etc. And a lot of posters do not adhere to these rules.


cheers,
Paul


I found it much easier to learn how to use Mathematica, using only the
online help. With R I had to buy several books to get a handle on it, which
is fine, but even the books that I have found to be most useful tend to be
didactically lacking—either too cursory or mired in unexplained programming
jargon. They are OK just not great.

What I think would be very helpful is an introduction to programming using
R, preferably a big thick college textbook that takes at least a semester to
go through, which should be a prerequisite for going through the
Introduction to R available on CRAN.

Also to do any analysis on real data you have to use the apply family of
functions to perform different functions by groups. A long introduction to
these functions, with lots of comparisons and contrasts between them would
be very helpful.

A few random examples concerning the R help: 


In my version of R (2.7.0 on Windows XP) typing
  

?+


doesn’t do anything, but then if you type in the next line
+ ?sum
you get the “Arithmetic Operators” help page.
If you had just typed
  

?sum

in the first place you get the “Sum of Vector Elements” help page. 


Most examples in the R help pages use way to many other functions to be
useful to a beginner. If an example uses 10 other functions besides the one
being described, chances are a beginner won’t know what one of them does,
which can set off a chain of having to look up other irrelevant functions.

Some function names in the base package are goofy, such as “rowsum” which is
used to “compute column sums across rows”, not to be confused with “rowSums”
which computes row sums.

  



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Ted Harding
On 01-Mar-10 12:07:52, Karl Ove Hufthammer wrote:
 On Mon, 01 Mar 2010 06:37:30 -0500 Duncan Murdoch
 murd...@stats.uwo.ca 
 wrote:
  Some functions output matrices where you would expect them to output
  data frames, and then this problem occurs. (Is there a reason why
  '$' 
  could/should not be made to 'work' on matrices too?)

 The reason for the difference is that data.frames are lists organized 
 into columns (so the $ handling comes from the list, where it means 
 extract the component) whereas a matrix is a single vector displayed
 in columns. 
 
 Sure, I know that. But is there are reason why the '$' can't be 
 overloaded to handle the extraction, as a *convenience* to the user? 
 After all, it *is* possible to extract columns by name from matrices
 too (e.g., using d[,Sepal.Width]).
 
 A similar type of overloading is used in the 'sp' class functions,
 where you can basically treat a 'SpatialPointsDataFrame', a 
 'SpatialLinesDataFrame' or a 'SpatialPolygonsDataFrame' as a data
 frame, 
 with '$colname' indexing and '[' subsetting, even though the
 *internals* 
 of the objects have a completely different (and very complex)
 structure. 
 But as a convenience to the user, you don't need to bother with the 
 internals, and can handle the object *as if* it were a data frame. It's
 a very comfortable way of working.
 
 -- 
 Karl Ove Hufthammer

I'm not sure that SpatialPointsDataFrame is a dataframe (despite
the name)! Is it not simply a list? In which case, using $ is
what you have to do to get at its components.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Mar-10   Time: 12:25:17
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:

 snip

  
  I understand that 2 dimensional rectangular matrix looks quite
  similar to data frame however it is only a vector with dimensions.
  As such it can have items of only one type (numeric, character, ...).
  And you can easily change dimensions of matrix.
  
  matrix-1:12
  dim(matrix) - c(2,6)
  matrix
  dim(matrix) - c(2,2,3)
  matrix
  dim(matrix) -NULL
  matrix
  
  So rectangular structure of printed matrix is a kind of coincidence
  only, whereas rectangular structure of data frame is its main feature.
  
  Regards
  Petr
  
  -- 
  Karl Ove Hufthammer
 
 Petr, I think that could be confusing! The way I see it is that
 a matrix is a special case of an array, whose dimension attribute
 is of length 2 (number of rows, number of columns); and row
 and column refer to the rectangular display which you see when
 R prints to matrix. And this, of course, derives directly from
 the historic rectangular view of a matrix when written down.
 
 When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
 you stripped it of its special title of matrix and cast it out
 into the motley mob of arrays (some of whom are matrices, but
 matrix no longer is).
 
 So the rectangular structure of printed matrix is not a coincidence,
 but is its main feature!

Ok. Point taken. However I feel that possibility to manipulate 
matrix/array dimensions by simple changing them as I  showed above 
together with perceiving matrix as a **vector with dimensions** prevented 
me especially in early days from using matrices instead of data frames and 
vice versa. 

Consider cbind and rbind confusing results for vectors with unequal mode. 
Far to often we can see something like that

 cbind(1:2,letters[1:2])
 [,1] [,2]
[1,] 1  a 
[2,] 2  b 

instead of

 data.frame(1:2,letters[1:2])
  X1.2 letters.1.2.
11a
22b

and then a question why does not the result behave as expected. Each type 
of object has some features which is good for some type of 
manipulation/analysis/plotting bud quite detrimental for others.

Regards
Petr


 
 To come back to Karl's query about why $ works for a dataframe
 but not for a matrix, note that $ is the extractor for getting
 a named component of a list. So, Karl, when you did
 
   d=head(iris[1:4])
 
 you created a dataframe:
 
   str(d)
   # 'data.frame':   6 obs. of  4 variables:
   #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
   #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
   #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
   #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
 
 (with named components Sepal.Length, ... , Petal.Width),
 and a dataframe is a special case of a general list. In a
 general list, the separate components can each be anything.
 In a dataframe, each component is a vector; the different
 vectors may be of different types (logical, numeric, ... )
 but of course the elements of any single vector must be
 of the same type; and, in a dataframe, all the vectors must
 have the same length (otherwise it is a general list, not
 a dataframe).
 
 So, when you print a dataframe, R chooses to display it
 as a rectangular structure. On the other hand, when you
 print a general list, R displays it quite differently:
 
   d
   #   Sepal.Length Sepal.Width Petal.Length Petal.Width
   # 1  5.1 3.5  1.4 0.2
   # 2  4.9 3.0  1.4 0.2
   # 3  4.7 3.2  1.3 0.2
   # 4  4.6 3.1  1.5 0.2
   # 5  5.0 3.6  1.4 0.2
   # 6  5.4 3.9  1.7 0.4
 
   d3 - list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
   d3
   # $C1
   # [1] 1.1 1.2 1.3
   # $C2
   # [1] 2.1 2.2 2.3 2.4
 
 Notice the similarity (though not identity) between the print
 of d3 and the output of str(d). There is a bit more hard-wired
 stuff built into a dataframe which makes it more than simply
 a list with all components vectors of equal length). However,
 one could also say that the rectangular structure is its
 main feature.
 
 As to why $ will not work on matrices: a matrix, as Petr
 points out, is a vector with a dimensions attribute which
 has length 2 (as opposed to a general array where the length
 of the dimensions attribute could be anything). Hence it is
 not a list of named components in the sense of list.
 
 Hence $ will not work with a matrix, since $ will not
 be able to find any list-components. which is basically what
 the error message
 
   d2$Sepal.Width
   # Error in d2$Sepal.Width : $ operator is invalid for atomic vectors
 
 is telling you: d2 is an atomic vector with a length-2 dimensions
 attribute. It has no list-type components for $ to get its
 hands on.
 
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 

Re: [R] two questions for R beginners

2010-03-01 Thread Duncan Murdoch

Karl Ove Hufthammer wrote:
On Mon, 01 Mar 2010 06:37:30 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
  
Some functions output matrices where you would expect them to output 
data frames, and then this problem occurs. (Is there a reason why '$' 
could/should not be made to 'work' on matrices too?)
  
  
The reason for the difference is that data.frames are lists organized 
into columns (so the $ handling comes from the list, where it means 
extract the component) whereas a matrix is a single vector displayed 
in columns. 



Sure, I know that. But is there are reason why the '$' can't be 
overloaded to handle the extraction, as a *convenience* to the user? 
  


See the second paragraph of my response.

Duncan Murdoch

After all, it *is* possible to extract columns by name from matrices
too (e.g., using d[,Sepal.Width]).

A similar type of overloading is used in the 'sp' class functions,
where you can basically treat a 'SpatialPointsDataFrame', a 
'SpatialLinesDataFrame' or a 'SpatialPolygonsDataFrame' as a data frame, 
with '$colname' indexing and '[' subsetting, even though the *internals* 
of the objects have a completely different (and very complex) structure. 
But as a convenience to the user, you don't need to bother with the 
internals, and can handle the object *as if* it were a data frame. It's 
a very comfortable way of working.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread John Sorkin
If it looks like a duck and quacks like a duck, it ought to behave like a duck.

To the user a matrix and a dataframe look alike . . . except a dataframe can 
hold non-numeric values. Thus to the users, a matrix looks like a special case 
of a DF, or perhaps conversely. If you can address elements of one structure 
using a given syntax, you should be able to address elements of the other 
structure using the same syntax. To do otherwise leads to confusion and is 
counter intuitive.
John




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) Petr 
PIKAL petr.pi...@precheza.cz 3/1/2010 8:57 AM 
Hi

r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:

 snip

  
  I understand that 2 dimensional rectangular matrix looks quite
  similar to data frame however it is only a vector with dimensions.
  As such it can have items of only one type (numeric, character, ...).
  And you can easily change dimensions of matrix.
  
  matrix-1:12
  dim(matrix) - c(2,6)
  matrix
  dim(matrix) - c(2,2,3)
  matrix
  dim(matrix) -NULL
  matrix
  
  So rectangular structure of printed matrix is a kind of coincidence
  only, whereas rectangular structure of data frame is its main feature.
  
  Regards
  Petr
  
  -- 
  Karl Ove Hufthammer
 
 Petr, I think that could be confusing! The way I see it is that
 a matrix is a special case of an array, whose dimension attribute
 is of length 2 (number of rows, number of columns); and row
 and column refer to the rectangular display which you see when
 R prints to matrix. And this, of course, derives directly from
 the historic rectangular view of a matrix when written down.
 
 When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
 you stripped it of its special title of matrix and cast it out
 into the motley mob of arrays (some of whom are matrices, but
 matrix no longer is).
 
 So the rectangular structure of printed matrix is not a coincidence,
 but is its main feature!

Ok. Point taken. However I feel that possibility to manipulate 
matrix/array dimensions by simple changing them as I  showed above 
together with perceiving matrix as a **vector with dimensions** prevented 
me especially in early days from using matrices instead of data frames and 
vice versa. 

Consider cbind and rbind confusing results for vectors with unequal mode. 
Far to often we can see something like that

 cbind(1:2,letters[1:2])
 [,1] [,2]
[1,] 1  a 
[2,] 2  b 

instead of

 data.frame(1:2,letters[1:2])
  X1.2 letters.1.2.
11a
22b

and then a question why does not the result behave as expected. Each type 
of object has some features which is good for some type of 
manipulation/analysis/plotting bud quite detrimental for others.

Regards
Petr


 
 To come back to Karl's query about why $ works for a dataframe
 but not for a matrix, note that $ is the extractor for getting
 a named component of a list. So, Karl, when you did
 
   d=head(iris[1:4])
 
 you created a dataframe:
 
   str(d)
   # 'data.frame':   6 obs. of  4 variables:
   #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
   #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
   #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
   #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
 
 (with named components Sepal.Length, ... , Petal.Width),
 and a dataframe is a special case of a general list. In a
 general list, the separate components can each be anything.
 In a dataframe, each component is a vector; the different
 vectors may be of different types (logical, numeric, ... )
 but of course the elements of any single vector must be
 of the same type; and, in a dataframe, all the vectors must
 have the same length (otherwise it is a general list, not
 a dataframe).
 
 So, when you print a dataframe, R chooses to display it
 as a rectangular structure. On the other hand, when you
 print a general list, R displays it quite differently:
 
   d
   #   Sepal.Length Sepal.Width Petal.Length Petal.Width
   # 1  5.1 3.5  1.4 0.2
   # 2  4.9 3.0  1.4 0.2
   # 3  4.7 3.2  1.3 0.2
   # 4  4.6 3.1  1.5 0.2
   # 5  5.0 3.6  1.4 0.2
   # 6  5.4 3.9  1.7 0.4
 
   d3 - list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
   d3
   # $C1
   # [1] 1.1 1.2 1.3
   # $C2
   # [1] 2.1 2.2 2.3 2.4
 
 Notice the similarity (though not identity) between the print
 of d3 and the output of str(d). There is a bit more hard-wired
 stuff built into a dataframe which makes it more than simply
 a list with all components vectors of equal length). However,
 one could also say that the rectangular structure is 

Re: [R] two questions for R beginners

2010-03-01 Thread Ted Harding
On 01-Mar-10 13:57:08, Petr PIKAL wrote:
 Hi
 r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:
  snip
  I understand that 2 dimensional rectangular matrix looks quite
  similar to data frame however it is only a vector with dimensions.
  As such it can have items of only one type (numeric, character,
  ...).
  And you can easily change dimensions of matrix.
  
  matrix-1:12
  dim(matrix) - c(2,6)
  matrix
  dim(matrix) - c(2,2,3)
  matrix
  dim(matrix) -NULL
  matrix
  
  So rectangular structure of printed matrix is a kind of coincidence
  only, whereas rectangular structure of data frame is its main
  feature.
  
  Regards
  Petr
  
  -- 
  Karl Ove Hufthammer
 
 Petr, I think that could be confusing! The way I see it is that
 a matrix is a special case of an array, whose dimension attribute
 is of length 2 (number of rows, number of columns); and row
 and column refer to the rectangular display which you see when
 R prints to matrix. And this, of course, derives directly from
 the historic rectangular view of a matrix when written down.
 
 When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
 you stripped it of its special title of matrix and cast it out
 into the motley mob of arrays (some of whom are matrices, but
 matrix no longer is).
 
 So the rectangular structure of printed matrix is not a coincidence,
 but is its main feature!
 
 Ok. Point taken. However I feel that possibility to manipulate 
 matrix/array dimensions by simple changing them as I  showed above 
 together with perceiving matrix as a **vector with dimensions**
 prevented me especially in early days from using matrices instead
 of data frames and vice versa. 
 
 Consider cbind and rbind confusing results for vectors with unequal
 mode. 
 Far too often we can see something like that
 
 cbind(1:2,letters[1:2])
  [,1] [,2]
 [1,] 1  a 
 [2,] 2  b 
 
 instead of
 
 
   X1.2 letters.1.2.
 11a
 22b
 
 and then a question why does not the result behave as expected.
 Each type of object has some features which is good for some
 type of manipulation/analysis/plotting but quite detrimental
 for others.
 
 Regards
 Petr
 
 [the rest from my previous reply stripped

Well, it depends what one means by as expected!

One of the things about R which many (and that certainly includes
me) have to find out the hard way is that you have to *learn*
what to expect! You can't just import it from prior experience in
other contexts. So, by the time you have learned that a matrix
is such that all its elements must have the same type, whereas
the components of a list (or as special case the columns of a
dataframe) can be of different types, you expect the first result
(your cbind(1:2,letters[1:2])): R can coerce the numerical
elements to character type, but letters have too much character
to allow themselves be coerced into numerical.

What can be confusing is that the on-screem output of your
data.frame(1:2,letters[1:2]) does not exhibit the  quotes
which identify character type, so that what appears on screen
does not inform the user of what is going on. In particular,
one can not learn from the display that the 1 and 2 are numbers
and not characters. Since a and b are displayed without , and
so are the 1 and 2, the 1 and 2 could be either -- until you
check it out by other means.

I think it is not that one should object to the behaviours of
the different types of objects. What really matters is that
one needs to learn what they are and why they behave as they do,
and not be misled by appearances.

And this includes the somtimes unpredictable behaviours of the
print methods.

In the context of this thread, I think the issues that have
been raised by the current discussion on matrices and dataframes
should receive sympathetic attention in Patrick Burns' project.

To close:

  cbind(c(pi,pi),letters[1:2])
  #  [,1]   [,2]
  # [1,] 3.14159265358979 a 
  # [2,] 3.14159265358979 b 
  data.frame(c(pi,pi),letters[1:2])
  #   c.pi..pi. letters.1.2.
  # 1  3.141593a
  # 2  3.141593b
  pi
  # [1] 3.141593
  as.character(pi)
  # [1] 3.14159265358979

That raises a few questions about expectations too!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Mar-10   Time: 14:50:50
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Duncan Murdoch

On 01/03/2010 9:19 AM, John Sorkin wrote:

If it looks like a duck and quacks like a duck, it ought to behave like a duck.

To the user a matrix and a dataframe look alike . . . except a dataframe can 
hold non-numeric values. Thus to the users, a matrix looks like a special case 
of a DF, or perhaps conversely. If you can address elements of one structure 
using a given syntax, you should be able to address elements of the other 
structure using the same syntax. To do otherwise leads to confusion and is 
counter intuitive.
  


Suppose X is a dataframe or a matrix.  What would you expect to get from 
X[1]?  What about as.vector(X), or as.numeric(X)?


The point is that a dataframe is a list, and a matrix isn't.  If users 
don't understand that, then they'll be confused somewhere.  Making 
matrices more list-like in one respect will just move the confusion 
elsewhere.  The solution is to understand the difference.


Duncan Murdoch

John




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) Petr PIKAL 
petr.pi...@precheza.cz 3/1/2010 8:57 AM 
Hi

r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:

 snip

  
  I understand that 2 dimensional rectangular matrix looks quite

  similar to data frame however it is only a vector with dimensions.
  As such it can have items of only one type (numeric, character, ...).
  And you can easily change dimensions of matrix.
  
  matrix-1:12

  dim(matrix) - c(2,6)
  matrix
  dim(matrix) - c(2,2,3)
  matrix
  dim(matrix) -NULL
  matrix
  
  So rectangular structure of printed matrix is a kind of coincidence

  only, whereas rectangular structure of data frame is its main feature.
  
  Regards

  Petr
  
  -- 
  Karl Ove Hufthammer
 
 Petr, I think that could be confusing! The way I see it is that

 a matrix is a special case of an array, whose dimension attribute
 is of length 2 (number of rows, number of columns); and row
 and column refer to the rectangular display which you see when
 R prints to matrix. And this, of course, derives directly from
 the historic rectangular view of a matrix when written down.
 
 When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)

 you stripped it of its special title of matrix and cast it out
 into the motley mob of arrays (some of whom are matrices, but
 matrix no longer is).
 
 So the rectangular structure of printed matrix is not a coincidence,

 but is its main feature!

Ok. Point taken. However I feel that possibility to manipulate 
matrix/array dimensions by simple changing them as I  showed above 
together with perceiving matrix as a **vector with dimensions** prevented 
me especially in early days from using matrices instead of data frames and 
vice versa. 

Consider cbind and rbind confusing results for vectors with unequal mode. 
Far to often we can see something like that


 cbind(1:2,letters[1:2])
 [,1] [,2]
[1,] 1  a 
[2,] 2  b 


instead of

 data.frame(1:2,letters[1:2])
  X1.2 letters.1.2.
11a
22b

and then a question why does not the result behave as expected. Each type 
of object has some features which is good for some type of 
manipulation/analysis/plotting bud quite detrimental for others.


Regards
Petr


 
 To come back to Karl's query about why $ works for a dataframe

 but not for a matrix, note that $ is the extractor for getting
 a named component of a list. So, Karl, when you did
 
   d=head(iris[1:4])
 
 you created a dataframe:
 
   str(d)

   # 'data.frame':   6 obs. of  4 variables:
   #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
   #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
   #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
   #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
 
 (with named components Sepal.Length, ... , Petal.Width),

 and a dataframe is a special case of a general list. In a
 general list, the separate components can each be anything.
 In a dataframe, each component is a vector; the different
 vectors may be of different types (logical, numeric, ... )
 but of course the elements of any single vector must be
 of the same type; and, in a dataframe, all the vectors must
 have the same length (otherwise it is a general list, not
 a dataframe).
 
 So, when you print a dataframe, R chooses to display it

 as a rectangular structure. On the other hand, when you
 print a general list, R displays it quite differently:
 
   d

   #   Sepal.Length Sepal.Width Petal.Length Petal.Width
   # 1  5.1 3.5  1.4 0.2
   # 2  4.9 3.0  1.4 0.2
   # 3  4.7 3.2  1.3 0.2
   # 4  4.6 3.1  1.5 0.2
   # 5  5.0 3.6  1.4 0.2
   # 6  

Re: [R] two questions for R beginners

2010-03-01 Thread Karl Ove Hufthammer
On Mon, 01 Mar 2010 09:09:11 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
  The reason for the difference is that data.frames are lists organized 
  into columns (so the $ handling comes from the list, where it means 
  extract the component) whereas a matrix is a single vector displayed 
  in columns. 
 
  Sure, I know that. But is there are reason why the '$' can't be 
  overloaded to handle the extraction, as a *convenience* to the user? 
 
 See the second paragraph of my response.

OK. So I take it that there are no *technical* reasons can't be made to 
work for matrices and named vectors? I tried redefining it for matrices 
with

`$.matrix`=function(x, name) ... something ...

but I still get an error message when trying to use it.

Of course I agree that 'the idea of a list is so fundamental to R that 
it needs to be something learned pretty early', but is there any harm in 
slightly 'blur[ing] the distinction between dataframes and matrices', as 
a convenience to the user? Or, in other words, what does one *gain* by 
having '$' on named matrices and vectors give a confusing error message 
instead of the expected results? Dinstinction for dinstinction's own 
sake is of little use.

In case anyone is wondering about the vector case (of which matrices is 
of course only a special case), here is an example:

 d=iris[,1:4]
 d1=head(d,1)
 d2=mean(d)
 
 d1
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1  5.1 3.5  1.4 0.2
 d2
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
5.84 3.057333 3.758000 1.199333 
 
 d3$Sepal.Width
[1] 3.5
 d4$Sepal.Width
Error in d4$Sepal.Width : $ operator is invalid for atomic vectors

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread hadley wickham
 One of the things about R which many (and that certainly includes
 me) have to find out the hard way is that you have to *learn*
 what to expect! You can't just import it from prior experience in
 other contexts. So, by the time you have learned that a matrix
 is such that all its elements must have the same type, whereas
 the components of a list (or as special case the columns of a
 dataframe) can be of different types, you expect the first result
 (your cbind(1:2,letters[1:2])): R can coerce the numerical
 elements to character type, but letters have too much character
 to allow themselves be coerced into numerical.

cbind is not the best example, because it has rather complex behaviour:

cbind(1:2, letters[1:2])
cbind(1:2, letters[1:2], data.frame(1:2))
cbind(cbind(1:2, letters[1:2]), data.frame(1:2))

Hadley
-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Gustaf Rydevik
On Mon, Mar 1, 2010 at 4:02 PM, Karl Ove Hufthammer k...@huftis.org wrote:
 On Mon, 01 Mar 2010 09:09:11 -0500 Duncan Murdoch murd...@stats.uwo.ca
 wrote:
  The reason for the difference is that data.frames are lists organized
  into columns (so the $ handling comes from the list, where it means
  extract the component) whereas a matrix is a single vector displayed
  in columns.
 
  Sure, I know that. But is there are reason why the '$' can't be
  overloaded to handle the extraction, as a *convenience* to the user?

 See the second paragraph of my response.

 OK. So I take it that there are no *technical* reasons can't be made to
 work for matrices and named vectors? I tried redefining it for matrices
 with

 `$.matrix`=function(x, name) ... something ...

 but I still get an error message when trying to use it.

 Of course I agree that 'the idea of a list is so fundamental to R that
 it needs to be something learned pretty early', but is there any harm in
 slightly 'blur[ing] the distinction between dataframes and matrices', as
 a convenience to the user? Or, in other words, what does one *gain* by
 having '$' on named matrices and vectors give a confusing error message
 instead of the expected results? Dinstinction for dinstinction's own
 sake is of little use.

 In case anyone is wondering about the vector case (of which matrices is
 of course only a special case), here is an example:

 d=iris[,1:4]
 d1=head(d,1)
 d2=mean(d)

 d1
  Sepal.Length Sepal.Width Petal.Length Petal.Width
 1          5.1         3.5          1.4         0.2
 d2
 Sepal.Length  Sepal.Width Petal.Length  Petal.Width
    5.84     3.057333     3.758000     1.199333

 d3$Sepal.Width
 [1] 3.5
 d4$Sepal.Width
 Error in d4$Sepal.Width : $ operator is invalid for atomic vectors

 --
 Karl Ove Hufthammer


As a technical excercise, I wrote the following function:

 '%W%'-function(e1,e2)e1[,which(colnames(e1)%in%e2)]

temp-matrix(1:6,nrow=2,dimnames=list(a=1:2,b=c(a,b,c)))
temp%W%b


I assume that the reason you can't use $.matrix , is that $ is a
primitive function and doesn't use the UseMethod function.

/Gustaf
-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Karl Ove Hufthammer
On Mon, 01 Mar 2010 12:25:20 - (GMT) ted.hard...@manchester.ac.uk 
ted.hard...@manchester.ac.uk wrote:
  A similar type of overloading is used in the 'sp' class functions,
  where you can basically treat a 'SpatialPointsDataFrame', a 
  'SpatialLinesDataFrame' or a 'SpatialPolygonsDataFrame' as a data
  frame, 
  with '$colname' indexing and '[' subsetting, even though the
  *internals* 
  of the objects have a completely different (and very complex)
  structure. 
  But as a convenience to the user, you don't need to bother with the 
  internals, and can handle the object *as if* it were a data frame. It's
  a very comfortable way of working.
 
 I'm not sure that SpatialPointsDataFrame is a dataframe (despite
 the name)! Is it not simply a list? In which case, using $ is
 what you have to do to get at its components.

That it's not a data frame is the point. :-)

And it not simply a list, it's a S4 object with the data (frame) stored 
in a 'data' slot, and '$' overloaded so you can use it *as if* it was a 
data frame. Example:

library(sp)
example(SpatialPolygonsDataFrame-class)

# Internal structure (warning: not pretty!)
str(ex_1.7$x)

# Extracting columns from the data frame
ex_1.7$z

# Both 'nrow' and '[' is overloaded, so you can use '[' 
# for normal subsetting. For example, to plot 10 random
# polygons, you can type 
ex.sub=ex_1.7[sample(nrow(ex_1.7), 10), ]
plot(ex.sub)

In most cases you don't have to worry about how everything is stored 
internally; things just work like you expect them to.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Karl Ove Hufthammer
On Mon, 01 Mar 2010 14:50:57 - (GMT) ted.hard...@manchester.ac.uk 
ted.hard...@manchester.ac.uk wrote:
   as.character(pi)
   # [1] 3.14159265358979
 
 That raises a few questions about expectations too!

Expectations can indeed be dangerous. I have been bitten by this one:

as.numeric(as.character(pi))

It works fine in the US, but not in Europe. :)

Hint: Try options(OutDec=,)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Patrick Burns

If it looks like a duck and quacks like a duck,
you ought to treat it like a duck.  That is,
use two subscripts:

x[i, j]

If you are an ornithologist, then you will know
more precisely what can be done.

Pat


On 01/03/2010 14:19, John Sorkin wrote:

If it looks like a duck and quacks like a duck, it ought to behave like a duck.

To the user a matrix and a dataframe look alike . . . except a dataframe can 
hold non-numeric values. Thus to the users, a matrix looks like a special case 
of a DF, or perhaps conversely. If you can address elements of one structure 
using a given syntax, you should be able to address elements of the other 
structure using the same syntax. To do otherwise leads to confusion and is 
counter intuitive.
John




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)  Petr 
PIKALpetr.pi...@precheza.cz  3/1/2010 8:57 AM
Hi

r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:

  snip



I understand that 2 dimensional rectangular matrix looks quite
similar to data frame however it is only a vector with dimensions.
As such it can have items of only one type (numeric, character, ...).
And you can easily change dimensions of matrix.

matrix-1:12
dim(matrix)- c(2,6)
matrix
dim(matrix)- c(2,2,3)
matrix
dim(matrix)-NULL
matrix

So rectangular structure of printed matrix is a kind of coincidence
only, whereas rectangular structure of data frame is its main feature.

Regards
Petr


--
Karl Ove Hufthammer


Petr, I think that could be confusing! The way I see it is that
a matrix is a special case of an array, whose dimension attribute
is of length 2 (number of rows, number of columns); and row
and column refer to the rectangular display which you see when
R prints to matrix. And this, of course, derives directly from
the historic rectangular view of a matrix when written down.

When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
you stripped it of its special title of matrix and cast it out
into the motley mob of arrays (some of whom are matrices, but
matrix no longer is).

So the rectangular structure of printed matrix is not a coincidence,
but is its main feature!


Ok. Point taken. However I feel that possibility to manipulate
matrix/array dimensions by simple changing them as I  showed above
together with perceiving matrix as a **vector with dimensions** prevented
me especially in early days from using matrices instead of data frames and
vice versa.

Consider cbind and rbind confusing results for vectors with unequal mode.
Far to often we can see something like that


cbind(1:2,letters[1:2])

  [,1] [,2]
[1,] 1  a
[2,] 2  b

instead of


data.frame(1:2,letters[1:2])

   X1.2 letters.1.2.
11a
22b

and then a question why does not the result behave as expected. Each type
of object has some features which is good for some type of
manipulation/analysis/plotting bud quite detrimental for others.

Regards
Petr




To come back to Karl's query about why $ works for a dataframe
but not for a matrix, note that $ is the extractor for getting
a named component of a list. So, Karl, when you did

   d=head(iris[1:4])

you created a dataframe:

   str(d)
   # 'data.frame':   6 obs. of  4 variables:
   #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
   #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
   #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
   #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4

(with named components Sepal.Length, ... , Petal.Width),
and a dataframe is a special case of a general list. In a
general list, the separate components can each be anything.
In a dataframe, each component is a vector; the different
vectors may be of different types (logical, numeric, ... )
but of course the elements of any single vector must be
of the same type; and, in a dataframe, all the vectors must
have the same length (otherwise it is a general list, not
a dataframe).

So, when you print a dataframe, R chooses to display it
as a rectangular structure. On the other hand, when you
print a general list, R displays it quite differently:

   d
   #   Sepal.Length Sepal.Width Petal.Length Petal.Width
   # 1  5.1 3.5  1.4 0.2
   # 2  4.9 3.0  1.4 0.2
   # 3  4.7 3.2  1.3 0.2
   # 4  4.6 3.1  1.5 0.2
   # 5  5.0 3.6  1.4 0.2
   # 6  5.4 3.9  1.7 0.4

   d3- list(C1=c(1.1,1.2,1.3), C2=c(2.1,2.2,2.3,2.4))
   d3
   # $C1
   # [1] 1.1 1.2 1.3
   # $C2
   # [1] 2.1 2.2 2.3 2.4

Notice the similarity (though not identity) between the print
of d3 and the output of str(d). There is a bit more hard-wired
stuff built into 

Re: [R] two questions for R beginners

2010-03-01 Thread hadley wickham
 Suppose X is a dataframe or a matrix.  What would you expect to get from
 X[1]?  What about as.vector(X), or as.numeric(X)?

 The point is that a dataframe is a list, and a matrix isn't.  If users don't
 understand that, then they'll be confused somewhere.  Making matrices more
 list-like in one respect will just move the confusion elsewhere.  The
 solution is to understand the difference.

What I find more confusing is the behaviour of $ with vectors.  In my
mind x$a is a shortcut for writing x[[a]], but:

 x - list(a = 1)
 x$a
[1] 1
 x - c(a = 1)
 x$a
Error in x$a : $ operator is invalid for atomic vectors
 x[[a]]
[1] 1

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Albert-Jan Roskam
I though duck-typing was about type-independency?
I could feed the bird object bread() or carrots(), or any other method, and 
that's okay as long as the bird doesn't die. And since ducks don't like carrots 
[at least, afaik]

Quaaack! ;-)
Albert-Jan

~~
In the face of ambiguity, refuse the temptation to guess.
~~

--- On Mon, 3/1/10, Patrick Burns pbu...@pburns.seanet.com wrote:


From: Patrick Burns pbu...@pburns.seanet.com
Subject: Re: [R] two questions for R beginners
To: r-help@r-project.org
Date: Monday, March 1, 2010, 5:08 PM


If it looks like a duck and quacks like a duck,
you ought to treat it like a duck.  That is,
use two subscripts:

x[i, j]

If you are an ornithologist, then you will know
more precisely what can be done.

Pat


On 01/03/2010 14:19, John Sorkin wrote:
 If it looks like a duck and quacks like a duck, it ought to behave like a 
 duck.

 To the user a matrix and a dataframe look alike . . . except a dataframe can 
 hold non-numeric values. Thus to the users, a matrix looks like a special 
 case of a DF, or perhaps conversely. If you can address elements of one 
 structure using a given syntax, you should be able to address elements of the 
 other structure using the same syntax. To do otherwise leads to confusion and 
 is counter intuitive.
 John




 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)  Petr 
 PIKALpetr.pi...@precheza.cz  3/1/2010 8:57 AM
 Hi

 r-help-boun...@r-project.org napsal dne 01.03.2010 13:03:24:

   snip


 I understand that 2 dimensional rectangular matrix looks quite
 similar to data frame however it is only a vector with dimensions.
 As such it can have items of only one type (numeric, character, ...).
 And you can easily change dimensions of matrix.

 matrix-1:12
 dim(matrix)- c(2,6)
 matrix
 dim(matrix)- c(2,2,3)
 matrix
 dim(matrix)-NULL
 matrix

 So rectangular structure of printed matrix is a kind of coincidence
 only, whereas rectangular structure of data frame is its main feature.

 Regards
 Petr

 --
 Karl Ove Hufthammer

 Petr, I think that could be confusing! The way I see it is that
 a matrix is a special case of an array, whose dimension attribute
 is of length 2 (number of rows, number of columns); and row
 and column refer to the rectangular display which you see when
 R prints to matrix. And this, of course, derives directly from
 the historic rectangular view of a matrix when written down.

 When you went from dim(matrix)-c(2,6) to dim(matrix)-c(2,2,3)
 you stripped it of its special title of matrix and cast it out
 into the motley mob of arrays (some of whom are matrices, but
 matrix no longer is).

 So the rectangular structure of printed matrix is not a coincidence,
 but is its main feature!

 Ok. Point taken. However I feel that possibility to manipulate
 matrix/array dimensions by simple changing them as I  showed above
 together with perceiving matrix as a **vector with dimensions** prevented
 me especially in early days from using matrices instead of data frames and
 vice versa.

 Consider cbind and rbind confusing results for vectors with unequal mode.
 Far to often we can see something like that

 cbind(1:2,letters[1:2])
       [,1] [,2]
 [1,] 1  a
 [2,] 2  b

 instead of

 data.frame(1:2,letters[1:2])
    X1.2 letters.1.2.
 1    1            a
 2    2            b

 and then a question why does not the result behave as expected. Each type
 of object has some features which is good for some type of
 manipulation/analysis/plotting bud quite detrimental for others.

 Regards
 Petr



 To come back to Karl's query about why $ works for a dataframe
 but not for a matrix, note that $ is the extractor for getting
 a named component of a list. So, Karl, when you did

    d=head(iris[1:4])

 you created a dataframe:

    str(d)
    # 'data.frame':   6 obs. of  4 variables:
    #  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
    #  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
    #  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
    #  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4

 (with named components Sepal.Length, ... , Petal.Width),
 and a dataframe is a special case of a general list. In a
 general list, the separate components can each be anything.
 In a dataframe, each component is a vector; the different
 vectors may be of different types (logical, numeric, ... )
 but of course the elements of any single vector must be
 of the same type; and, in a dataframe, all the vectors must
 have the same length (otherwise it is a general list, not
 a dataframe).

 So, when you print a dataframe, R chooses to display it
 as a rectangular

Re: [R] two questions for R beginners

2010-03-01 Thread Duncan Murdoch

On 01/03/2010 11:33 AM, hadley wickham wrote:

 Suppose X is a dataframe or a matrix.  What would you expect to get from
 X[1]?  What about as.vector(X), or as.numeric(X)?

 The point is that a dataframe is a list, and a matrix isn't.  If users don't
 understand that, then they'll be confused somewhere.  Making matrices more
 list-like in one respect will just move the confusion elsewhere.  The
 solution is to understand the difference.

What I find more confusing is the behaviour of $ with vectors.  In my
mind x$a is a shortcut for writing x[[a]], but:
  


And I still remember being surprised by the x[[a]] behaviour!

Duncan Murdoch

 x - list(a = 1)
 x$a
[1] 1
 x - c(a = 1)
 x$a
Error in x$a : $ operator is invalid for atomic vectors
 x[[a]]
[1] 1

Hadley




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Peter Alspach
 On 01/03/2010 9:19 AM, John Sorkin wrote:
 If it looks like a duck and quacks like a duck, it ought to 
 behave like a duck.
  

This brings up another confusion for new users.  Simply typing the
object name at the command line gives just one view of the object (that
provided by print()).  Real ducks fooled by decoy ducks get shot.  The
consequences of thinking a matrix and dataframe are the same are not
quite so severe :-)

Hei kona ra .

Peter Alspach

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Jim Lemon

On 03/02/2010 02:02 AM, Karl Ove Hufthammer wrote:

...
Of course I agree that 'the idea of a list is so fundamental to R that
it needs to be something learned pretty early', but is there any harm in
slightly 'blur[ing] the distinction between dataframes and matrices', as
a convenience to the user? Or, in other words, what does one *gain* by
having '$' on named matrices and vectors give a confusing error message
instead of the expected results? Dinstinction for dinstinction's own
sake is of little use.



Matrices are all one type, from finish back to start,
while lists can manage many just by keeping them apart.
Like foxes, geese and corn that must be ferried 'cross the Styx,
it's not a good idea to let the separate columns mix.
So yield not to temptation when the two appear to match,
for appearances mislead and in the substance lies the catch.
Convenience is a quicksand that can suck the user down.
'Tis better to avoid the stuff and know one's way around.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Ted Harding
On 01-Mar-10 22:44:22, Jim Lemon wrote:
 On 03/02/2010 02:02 AM, Karl Ove Hufthammer wrote:
...
 Of course I agree that 'the idea of a list is so fundamental to R that
 it needs to be something learned pretty early', but is there any harm
 in
 slightly 'blur[ing] the distinction between dataframes and matrices',
 as
 a convenience to the user? Or, in other words, what does one *gain* by
 having '$' on named matrices and vectors give a confusing error
 message
 instead of the expected results? Dinstinction for dinstinction's own
 sake is of little use.

 
 Matrices are all one type, from finish back to start,
 while lists can manage many just by keeping them apart.
 Like foxes, geese and corn that must be ferried 'cross the Styx,
 it's not a good idea to let the separate columns mix.
 So yield not to temptation when the two appear to match,
 for appearances mislead and in the substance lies the catch.
 Convenience is a quicksand that can suck the user down.
 'Tis better to avoid the stuff and know one's way around.
 
 Jim

That HAS to be a Fortune!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Mar-10   Time: 23:08:20
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Keo Ormsby

Patrick Burns escribió:

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?

* What documents helped you the most in this
initial phase?

I especially want to hear from people who are
lazy and impatient.

Feel free to write to me off-list.  Definitely
write off-list if you are just confirming what
has been said on-list.

Perhaps my biggest problem was that I couldn't (and still haven't) seen 
*absolute beginners* documents. It took me about six months to start 
using R for my everyday data analysis, and now I can't imagine life 
without it!
My problem was that I knew some programming (Java) and had never used a 
command line for statistics. All my statistical needs had been 
accomplished through the graphical interface of SPSS or similar software 
(even Excel!). I have a feeling that almost all Introduction to R 
documents are made for making the switch from SPSS and SAS scripting, to 
R. But I have had a very difficult time using R as an *entry level* 
statistical scripting language to help my colleagues (none of us are 
either programmers nor statisticians, mostly biology PhDs and a couple 
of MDs).
I would love to see a text oriented towards someone who has never used 
anything but Excel, but realizes that to do science today you have to go 
beyond the Data analysis toolbar from Excel.

(Plese tell me if you know of any)
Best to all,
Keo.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Liviu Andronic
On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote:
  Perhaps my biggest problem was that I couldn't (and still haven't) seen
 *absolute beginners* documents.

Perhaps http://www.r-tutor.com/? Also recently a webinar on R [2] was
held and it hosts complete course notes and recordings. Otherwise,
there was once a link posted on r-sig-teaching that would probably fit
your needs, but I cannot find it now.
[2] http://www.fort.usgs.gov/brdscience/learnR.htm


 It took me about six months to start using R
 for my everyday data analysis, and now I can't imagine life without it!
  My problem was that I knew some programming (Java) and had never used a
 command line for statistics. All my statistical needs had been accomplished
 through the graphical interface of SPSS or similar software (even Excel!). I
 have a feeling that almost all Introduction to R documents are made for
 making the switch from SPSS and SAS scripting, to R. But I have had a very
 difficult time using R as an *entry level* statistical scripting language to

Maybe Rcmdr could help here? It allows performing entry level
statistical analyses, while displaying the complete syntax.
Liviu


 help my colleagues (none of us are either programmers nor statisticians,
 mostly biology PhDs and a couple of MDs).
  I would love to see a text oriented towards someone who has never used
 anything but Excel, but realizes that to do science today you have to go
 beyond the Data analysis toolbar from Excel.
  (Plese tell me if you know of any)
  Best to all,
  Keo.


  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Keo Ormsby

Liviu Andronic escribió:

On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote:
  

 Perhaps my biggest problem was that I couldn't (and still haven't) seen
*absolute beginners* documents.


Perhaps http://www.r-tutor.com/? Also recently a webinar on R [2] was
held and it hosts complete course notes and recordings. Otherwise,
there was once a link posted on r-sig-teaching that would probably fit
your needs, but I cannot find it now.
[2] http://www.fort.usgs.gov/brdscience/learnR.htm

[..snip..]


Maybe Rcmdr could help here? It allows performing entry level
statistical analyses, while displaying the complete syntax.
Liviu
  

Yes, thanks. I will check the webinars.
In the beginning I did bump in to Rcmdr, but since it was a *package* 
that had to be downloaded from *CRAN*, and during installation it asked 
for a whole lot of options that at the time I had no idea what was meant 
by all that. help() always gives a description which is very useful if 
you know what you want to do, but words like *Generic function*, or the 
ellipsis (...) *further arguments to be passed to other functions*,  can 
be daunting to the uninitiated.  What I wanted to convey is that if you 
start from your own non-programmer non-statistician area, and try to go 
directly into R, you will quickly find yourself immerse in a lot of 
terminology and usability logic that is kind of alien to even proficient 
users of web browsers and Office suites. Of course there is a lot of 
information out there, many of it very simple indeed, but perhaps what I 
was looking for then was not an R for dummies, but an R for LAZY 
dummies.


By the way, what I meant by the 6 months, was to completely be R 
dependent, to not even consider using another software.


Thanks for the links!
Best,
Keo.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-01 Thread Tony B
Background: During my uni days, I was taught to use MAPLE, MATLAB,
SPSS, SAS, C++ and Java. Then after uni, several years went by without
me ever using any of them again and was told to just use Excel. Then I
started my PhD and was told I should start using R instead (something
I'd never even heard of before).

I would class myself as being just above a beginner, but not by very
much. Probably within walking distance.

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

(1) I read a lot about R having awesome graphics capabilities, but
when i looked at the the graphs on the R home page I was a little
underwhelmed. I thought Excel graphs looked better (though, to be
fair, since that first time, i have seen some pretty awesome graphics
produced with R, and even a cool animation someone posted on youtube
synchronised to music).

(2) The whole *apply family of functions just confused me and looking
at the examples didn't really help me to be honest. I understood the
idea of vectorisation but I couldn't work out how to get what I wanted
as the end result. The plyr package has solved that issue for me
though and I now appreciate how cool these functions are.

(3) There are a lot of cool sounding packages on CRAN. Sometimes I can
read the ref manual and still have no idea how they work. A short
tutorial on how the author sees the package being used would be
helpful.

(4) Also, trivial examples are great for conveying the basics of how a
function works. Complicated examples give me a headache.

(5) I use to have issues trying to find R related material on the web
(then i discovered rseek etc).

(6) cannot allocate vector of size... -- i think this has to be the
most asked question ever on r-help. Not so much of a stumbling block
for me anymore, but i always got annoyed whenever i saw it.

 * What documents helped you the most in this
 initial phase?

(1) R cheat sheets are fantastic because I can never remember most of
them off the top of my head.

(2) Rwiki has save me many precious hours by have easy to follow
examples.

(3) r-help is great for trying to find answers to questions for the
most part. I've learned loads just reading responses people have
kindly contributed. Some threads can get long and it would be nice if
the origin author would summarise at the end once a suitable solution
is found (some other lists do this).

(4) Random little blog posts that describe how to do a fun* task in R.
These short posts are usually the best way for me to learn, because
they don't require too much effort, are sort of easy to understand and
follow through from beginning to end, and give you a cool** end
result.

(5) I prefer 'cookbooks' that show you how to do fun stuff (and hence
learn from) as opposed to looking at the official R guides (confession
time: I haven't looked at the intro to R guide since my 1st month of
using R... which was a couple years ago now).

 * where did you look for help expecting answers, but did not find them?

Often times, the ?[function name] help pages just didn't make sense to
me, even after trying to understand the examples.

Sometimes it'd be nice to have something like they do on thottbot for
World of Warcraft where each quest has a thread for people to discuss
how it works and little tricks. So the R equivalent I guess would be
to have a link at the bottom of each help page which links to a thread
dedicated to a specific function and where users talk discuss it and
offer their own examples and points of view about it. Of course that
is probably overkill. I just wanted to see if i could mention WoW in
my post.

 I especially want to hear from people who are
 lazy

I did a degree in Maths.

 and impatient.

I sometimes produce graphics in Excel.

Cheers,
Tony

* = fun is a relative term. I still get a buzz out of seeing ascii
art.
** = cool is also relative term. I still think Babylon 5 was cooler
than Star Trek DS9. Though nowhere near as cool as Doctor Who.

On 25 Feb, 17:31, Patrick Burns pbu...@pburns.seanet.com wrote:
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 * What documents helped you the most in this
 initial phase?

 I especially want to hear from people who are
 lazy and impatient.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.

 --
 Patrick Burns
 pbu...@pburns.seanet.comhttp://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] two questions for R beginners

2010-03-01 Thread RICHARD M. HEIBERGER
I would love to see a text oriented towards someone who has never used
anything but Excel, but realizes that to do science today you have to go
beyond the Data analysis toolbar from Excel.
(Plese tell me if you know of any)
Best to all,
Keo.

Please look at *R through Excel, *the book that Erich Neuwirth and I
published last summer.
http://www.springer.com/978-1-4419-0051-7
Erich's RExcel seamlessly integrates the entire set of R's statistical and
graphical tools
into Excel.  Our book shows how to use the system in many ways.  You
can place any R command
within the Excel automatic recalculation mode, you can run Rcmdr from the
Excel menu bar.
You can run arbitrary R scripts from an Excel spreadsheet.  And the full R
command window
is also available.

While RExcel can be downloaded from CRAN by the RExcelInstaller package,
it is much easier to download all of R, including RExcel and Rcmdr, in a
single installer
from http://rcom.univie.ac.at

Rich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-27 Thread Johannes Huesing
Dieter Menne dieter.me...@menne-biomed.de [Fri, Feb 26, 2010 at 08:39:14AM 
CET]:
 
 
 Patrick Burns wrote:
  
  * What were your biggest misconceptions or
  stumbling blocks to getting up and running
  with R?
  
  
 (This derives partly from teaching)
 
[...]
 
 The concept of environment. With S it was worse, though.
 

Agreed, though a beginner shouldn't be exposed to this aspect.
In the beginning you can analyse away before you are drowning
in variable names if you start with simple examples.

Which plotting parameters can be passed with basic plot functions,
and which ones have to be declared with par()? How do I 
set the min and max values for the x and y axis? (This 
aspect is drowned among all the options under ?par.)

Generally, the help pages are built like man pages, where
all options are given more or less equal consideration, even
if one option is used almost always and one only for esoteric
purposes. Given that help() is the most intuitive thing to
look for, it may be nice to include references to other
sources like rwiki if the respective page is good, even if
it may be disruptive wrt display device.

-- 
Johannes Hüsing   There is something fascinating about science. 
  One gets such wholesale returns of conjecture 
mailto:johan...@huesing.name  from such a trifling investment of fact.  
  
http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-27 Thread John Sorkin
I don't think I am a tyro but neither am I a wizard. This being said R has a 
number of aspects that make it difficult.

Error messages that are not helpful
Manual pages that are written in Martin.
Lack of examples on some manual pages
Lack of comments in code

There are other hurdles. The concept of vectorization and its related syntax 
took a long time to understand.
John
John Sorkin
jsor...@grecc.umaryland.edu 
-Original Message-
From: Saeed Abu Nimeh sabun...@gmail.com
Cc:  r-help@r-project.org
To:  ivan.calan...@uni-hamburg.de

Sent: 2/26/2010 11:36:38 PM
Subject: Re: [R] two questions for R beginners

Hi Ivan,

On 2/26/10 6:30 AM, Ivan Calandra wrote:
 You are definitely right...
 What to do with bad beginner's questions is not a simple issue.

 If a beginner's mailing list is created, who will answer to such
 questions?

If I subscribe to the beginners mailing list, then I have to expect 
novice questions and I should be willing to help. Otherwise, I should 
not be there.

And moreover, the beginners won't take advantage of the other
 questions (I've personally learned a lot trying to understand the
 questions and answers to other's problems).

They can still subscribe to the advanced, but they will know that they 
are here to observe and learn, not to ask novice questions. You want to 
ask basic stuff, go to the beginners list :)

Not sure if you guys have been on some of the linux mailing lists out 
there, but man let me tell you, some of these lists have a RTFM attitude 
and they will fry you if you ask novice questions. Frankly, that is 
understandable, as most of the members are geeks and they have higher 
expectations. This mailing list is different, I have seen posts from 
different disciplines; biology, biostats, stats, computer science, 
oceanography, etc. So, IMO, there should be a beginners list to cope 
with such broad committee.

Thanks,
Saeed

And also, as you said, the
 problems might persist.
 The beginner's mailing list might be good in one aspect though: the
 experts who subscribe to it would be willing to help the beginners to
 get started with R, knowing that the questions might not be clearly stated.

 As you pointed out, the mailing list is not the best for basic stuff
 (the question is of course what is basic?). Not everybody knows some
 colleagues who work with R (I'm personally the 1st one to use R in my lab).
 I think, somehow and I have no idea how, documentation and guidance to
 search for help should be more accessible as soon as you start with R.
 Maybe a _*clear*_ section on the R homepage or in the introduction to
 R manual like where to find help, including all of the most common
 and useful resources available (from ? and RSiteSearch() to R Wiki and
 Crantastic).

 I hope that this whole discussion might help to make the R world better.
 Thank you Patrick for initiating it!
 Regards,
 Ivan

 Le 2/26/2010 15:09, Paul Hiemstra a écrit :
 Ivan Calandra wrote:
 Since you want input from beginners, here are some thoughts

 I had and still have two big problems with R:
 - this vectorization thing. I've read many manuals (including R
 inferno), but I'm still not completely clear about it. In simple
 examples, it's fine. But when it gets a bit more complex, then...
 Related to it, the *apply functions are still a bit difficult to
 understand. When I have to use them, I just try one and see what
 happens. I don't understand them well enough to know which one I need.
 - the second problem is where to find the functions/packages I need.
 There are many options, and that's actually the problem. R Wiki,
 Rseek, RSiteSearch, Crantastic, etc... When you start with R, you
 discover that the capabilities of R are almost unlimited and you
 don't really know where to start, where to find what you need.

 As noted in earlier posts, the mailing list is really great, but some
 people are really hard with beginners. It was noted in a discussion a
 few days ago, but it looks like some don't realize how difficult it
 is at the beginning to formulate a good question, clear, with
 self-contained example and so on. Moreover, not everybody speaks
 English natively. I don't mean that you must help, even when the
 question is really vague and not clear and whatever. I'm just saying
 that if you don't want to help (whatever the reason), you don't have
 to say it badly. But in any cases, the mailing list is still really
 helpful. As someone noted (sorry I erased the email so I don't
 remember who), it might be a good idea to split it.
 Hi everyone,

 My 2ct about the mailing list :). I understand that beginners have a
 hard time formulating a good question. But the problem is that we
 can't answer the question when it is unclear. So either I:

 - Don't bother answering
 - Try do discuss with the author of the question, taking lots of time
 to find out what exactly is the question.
 - Send a read the posting guide answer

 I mostly do the first, as I have to get things done during my PhD

Re: [R] two questions for R beginners

2010-02-27 Thread xlr82sas

Hi,

   I don't think you should split the list for beginners.

   On the SAS list we get questions from novices such as secretaries,
janitorial services, human resources and even top executives.

  They often approach SAS from a very intuitive standpoint. These questions
often shake the experts to the core. They ask themselves, why didn't I allow
R to do this.

For instance I novice might ask of the SAS datastep language:
Why can't I just 
Array X[3] (A,1.ROGER,26) 

You can do the above in several other integrated SAS languages
(MACRO,SCL,SAS-C,IML-sort of) at ~$5000+ per year for each except macro)

 A user asked recently
   array x[2,3,4,5] x1-x120;
Do i=1 to 2;
  Do j=1 to 3;
Do k=1 to 4;
  Do l=1 to 5; 
 Xijkl = i*j*k*l;
  End;
End;
  End;
End;

R can do this nicely with lists but SAS can do it with SCL,Macro,IML
and C. I think SAS-IML has the most intuitive solution.

I read Nabble, perl and SAS lists religiously, what I would like to
see is one list that somehow integrated R, SAS and perl solutions. SAS users
are trying to create integrated  'DROP DOWN' capabilties that allow
programmers to switch languages mid stream to get the best solution. I often
want to respond with SAS solutions, just so R and perl can think about
adding functionality.

ie
  data new;
set data;
perl on;
  perl code;
  ...
perl off;
   sas code;
   .
   R on;
 R code;
 ;
   R off;
run;

I am trying to get SAS users to do some of their processing in R(within
SAS). I am toying with a set of tips that show SAS intuitive code beside R
code, so SAS users can become more comfortable with R.

SAS is much more intuitive than R
for instance R 'for' loops with funny '}s' next to the more intuitive SAS
do/ends.

I could expound on the type of problems perl handles better than SAS or R, 
problems R handles better than SAS or perl and problems SAS handles better
than R or perl.
 
  
   


   
   
 
   

-- 
View this message in context: 
http://n4.nabble.com/two-questions-for-R-beginners-tp1569384p1572149.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-27 Thread xlr82sas

Hi,

   I don't think you should split the list for beginners.

   On the SAS list we get questions from novices such as secretaries,
janitorial services, human resources and even top executives.

  They often approach SAS from a very intuitive standpoint. These questions
often shake the experts to the core. They ask themselves, why didn't I allow
R to do this.

For instance I novice might ask of the SAS datastep language:
Why can't I just 
Array X[3] (A,1.ROGER,26) 

You can do the above in several other integrated SAS languages
(MACRO,SCL,SAS-C,IML-sort of) at ~$5000+ per year for each except macro)

 A user asked recently
   array x[2,3,4,5] x1-x120;
Do i=1 to 2;
  Do j=1 to 3;
Do k=1 to 4;
  Do l=1 to 5; 
 Xijkl = i*j*k*l;
  End;
End;
  End;
End;

R can do this nicely with lists but SAS can do it with SCL,Macro,IML
and C. I think SAS-IML has the most intuitive solution.

I read Nabble, perl and SAS lists religiously, what I would like to
see is one list that somehow integrated R, SAS and perl solutions. SAS users
are trying to create integrated  'DROP DOWN' capabilties that allow
programmers to switch languages mid stream to get the best solution. I often
want to respond with SAS solutions, just so R and perl can think about
adding functionality.

ie
  data new;
set data;
perl on;
  perl code;
  ...
perl off;
   sas code;
   .
   R on;
 R code;
 ;
   R off;
run;

I am trying to get SAS users to do some of their processing in R(within
SAS). I am toying with a set of tips that show SAS intuitive code beside R
code, so SAS users can become more comfortable with R.

SAS is much more intuitive than R
for instance R 'for' loops with funny '}s' next to the more intuitive SAS
do/ends.

I could expound on the type of problems perl handles better than SAS or R, 
problems R handles better than SAS or perl and problems SAS handles better
than R or perl.
 
  
   


   
   
 
   

-- 
View this message in context: 
http://n4.nabble.com/two-questions-for-R-beginners-tp1569384p1572165.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-27 Thread Kingsford Jones
On Fri, Feb 26, 2010 at 8:00 AM, Robert Baer rb...@atsu.edu wrote:
[...]
 The things that led from frustration to independence was understanding
 the difference between data types like matrix and dataframe and learning
 there were commands to tell what you were working with at any given time.
 Did the data read in as character, numeric, or factor, etc.  Commands
 like: str, class, mode, ls, search, help, help.search, etc can help you
 figure out what you are doing.

Yes!  I think this is really key.  When I started R I had no
programming experience and thought of projects in terms of statistical
procedures and printed output (cut teeth w/ Minitab -- SPSS -- SAS).
 If I wanted to analyze data using R I looked for examples of using an
analysis function of interest (e.g, lm, princomp, rpart...) and did my
best to adapt to my project.  What was of interest was the printed
output rather than understanding the objects that I was passing and
creating. It wasn't until I buckled down and read the (admittedly
quite dry and often dense) materials describing the language that the
sailing became smooth (or at least much more rapid and took me to more
interesting places).  Important resources I recall using were An
Introduction to R (which I avoided for about the first 6mo because of
language I wasn't yet familiar with), r-help archives, man pages, and
particularly the early chapters of MASS and S Programming by VR.  But
I think the real 'a-ha' moments came by interactively exploring
objects within R.  This was vastly facilitated by the use of str and
indexing tools ([, [[, $, @).

A mantra for R beginners might be In R we work with objects, and str
reveals their essence ;-)

Kingsford Jones



 Rob




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Patrick Burns
 Sent: Thursday, February 25, 2010 11:31 AM
 To: r-help@r-project.org
 Subject: [R] two questions for R beginners

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 * What documents helped you the most in this
 initial phase?

 I especially want to hear from people who are
 lazy and impatient.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.

 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Ivan Calandra

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R 
inferno), but I'm still not completely clear about it. In simple 
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to 
understand. When I have to use them, I just try one and see what 
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I need. 
There are many options, and that's actually the problem. R Wiki, Rseek, 
RSiteSearch, Crantastic, etc... When you start with R, you discover that 
the capabilities of R are almost unlimited and you don't really know 
where to start, where to find what you need.


As noted in earlier posts, the mailing list is really great, but some 
people are really hard with beginners. It was noted in a discussion a 
few days ago, but it looks like some don't realize how difficult it is 
at the beginning to formulate a good question, clear, with 
self-contained example and so on. Moreover, not everybody speaks English 
natively. I don't mean that you must help, even when the question is 
really vague and not clear and whatever. I'm just saying that if you 
don't want to help (whatever the reason), you don't have to say it 
badly. But in any cases, the mailing list is still really helpful. As 
someone noted (sorry I erased the email so I don't remember who), it 
might be a good idea to split it.


Hope that's what you wanted
Ivan


Le 2/26/2010 08:39, Dieter Menne a écrit :


Patrick Burns wrote:
   

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?


 

(This derives partly from teaching)

The fact that this xapply-stuff was not idempotent (worse: not always) and
that you need a monster like do.call() to straighten this out. Nowadays,
plyr comes close.

The concept of environment. With S it was worse, though.

That you cannot change values passed by reference. I noted that the latter
is no problem for students who have not worked with c(++/#) before. That
there is only one return-result in functions.

[ and the likes as an operator.

10 years ago, when I started, the message was: S4 is the future, S3 is
legacy. So I learned S4. Only to never use is in self-written code later.
Might be different for BioConductor people.

That sometimes you can use vectors not in data= (lattice), and sometimes not
(ggplot2). Still a VERY confusing inconsistency.

The why-does-this-not-print FAQ.

Why does par(oma..) not work with lattice?

Dieter





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Mario Valle
My difficulties:
1) Statistics :-) well, I'm learning.
2) Understand what is available *per subject area*. Something like the task 
view for
packages, should be compiled for basic commands/functions. Like: all things 
related to
string manipulation, all things related to number formatting, all *apply 
things, and so
on. Something similar is available for C runtime library functions (like in
http://msdn.microsoft.com/en-us/library/2aza74he(VS.71).aspx ) and is really 
useful, also
to expand the number of functions known.
3) The Diktakt-like: avoid for loops! without clear examples of alternatives. 
I have
found them later in the maillist, but at the beginning it is not simple, 
especially coming
from C/C++.
4) for statement behavior different from C/C++: for(i in 1:0) counts backward 
instead of
stopping.
5) missing small things like ++var

On the positive side:
- it is not too difficult to setup something simple to create a decent chart.
- it is possible to use for loops without feeling guilty. :-)
- documentation is very well done. Maybe some page are still clear only to who 
already
know the argument.
- there are zillions of courses/papers/tutorials to read
- after studying R by myself, now I'm becoming the local R expert, that from a 
workplace
point of view is not bad...

Hope it helps.
Ciao!
mario


Ivan Calandra wrote:
 Since you want input from beginners, here are some thoughts
 
 I had and still have two big problems with R:
 - this vectorization thing. I've read many manuals (including R 
 inferno), but I'm still not completely clear about it. In simple 
 examples, it's fine. But when it gets a bit more complex, then...
 Related to it, the *apply functions are still a bit difficult to 
 understand. When I have to use them, I just try one and see what 
 happens. I don't understand them well enough to know which one I need.
 - the second problem is where to find the functions/packages I need. 
 There are many options, and that's actually the problem. R Wiki, Rseek, 
 RSiteSearch, Crantastic, etc... When you start with R, you discover that 
 the capabilities of R are almost unlimited and you don't really know 
 where to start, where to find what you need.
 
 As noted in earlier posts, the mailing list is really great, but some 
 people are really hard with beginners. It was noted in a discussion a 
 few days ago, but it looks like some don't realize how difficult it is 
 at the beginning to formulate a good question, clear, with 
 self-contained example and so on. Moreover, not everybody speaks English 
 natively. I don't mean that you must help, even when the question is 
 really vague and not clear and whatever. I'm just saying that if you 
 don't want to help (whatever the reason), you don't have to say it 
 badly. But in any cases, the mailing list is still really helpful. As 
 someone noted (sorry I erased the email so I don't remember who), it 
 might be a good idea to split it.
 
 Hope that's what you wanted
 Ivan
 
 
 Le 2/26/2010 08:39, Dieter Menne a écrit :
 Patrick Burns wrote:

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?


  
 (This derives partly from teaching)

 The fact that this xapply-stuff was not idempotent (worse: not always) and
 that you need a monster like do.call() to straighten this out. Nowadays,
 plyr comes close.

 The concept of environment. With S it was worse, though.

 That you cannot change values passed by reference. I noted that the latter
 is no problem for students who have not worked with c(++/#) before. That
 there is only one return-result in functions.

 [ and the likes as an operator.

 10 years ago, when I started, the message was: S4 is the future, S3 is
 legacy. So I learned S4. Only to never use is in self-written code later.
 Might be different for BioConductor people.

 That sometimes you can use vectors not in data= (lattice), and sometimes not
 (ggplot2). Still a VERY confusing inconsistency.

 The why-does-this-not-print FAQ.

 Why does par(oma..) not work with lattice?

 Dieter



 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Ing. Mario Valle
Data Analysis and Visualization Group| http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS)  | Tel:  +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91) 610.82.82

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Patrick Burns

Saeed,

If the R-help list were split, what do you
see as the pieces?

Pat

On 26/02/2010 01:53, Saeed Abu Nimeh wrote:

On Thu, Feb 25, 2010 at 9:31 AM, Patrick Burnspbu...@pburns.seanet.com  wrote:

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?


1- Compared to other programming languages it is hard to learn R by
example, because it is hard to find code on the web that will do the
exact thing you are looking for, sometimes you might get lucky though.
By contrast, take Perl for example, it is an easy language to learn by
example.

2- The R mailing list. Beginners get frustrated after they struggle
for a long time to solve a problem and the easiest thing then is to
send an email to the R mailing list. I did this in the past. The best
thing that happened was that my request was neglected and I had to
spend more time on the problem and find a solution by myself
eventually. Do not get me wrong, I am not saying that the mailing list
is bad, but it should be more organized. Maybe broken down into couple
of other mailing lists. This might bring up a good discussion thread.



* What documents helped you the most in this
initial phase?


An Introduction to R by Venables
simpleR – Using R for Introductory Statistics by Verzani



--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Patrick Burns

On 25/02/2010 20:42, Greg Snow wrote:

Patrick,

I would add one more question:

* where did you look for help expecting answers, but did not find them?



Yes, an excellent additional question.

Pat


If you add hubris to laziness and impatience, you have Larry Wall's 3 virtues 
of a programmer.

[...]


--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Paul Hiemstra

Ivan Calandra wrote:

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R 
inferno), but I'm still not completely clear about it. In simple 
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to 
understand. When I have to use them, I just try one and see what 
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I need. 
There are many options, and that's actually the problem. R Wiki, 
Rseek, RSiteSearch, Crantastic, etc... When you start with R, you 
discover that the capabilities of R are almost unlimited and you don't 
really know where to start, where to find what you need.


As noted in earlier posts, the mailing list is really great, but some 
people are really hard with beginners. It was noted in a discussion a 
few days ago, but it looks like some don't realize how difficult it is 
at the beginning to formulate a good question, clear, with 
self-contained example and so on. Moreover, not everybody speaks 
English natively. I don't mean that you must help, even when the 
question is really vague and not clear and whatever. I'm just saying 
that if you don't want to help (whatever the reason), you don't have 
to say it badly. But in any cases, the mailing list is still really 
helpful. As someone noted (sorry I erased the email so I don't 
remember who), it might be a good idea to split it.

Hi everyone,

My 2ct about the mailing list :). I understand that beginners have a 
hard time formulating a good question. But the problem is that we can't 
answer the question when it is unclear. So either I:


- Don't bother answering
- Try do discuss with the author of the question, taking lots of time to 
find out what exactly is the question.

- Send a read the posting guide answer

I mostly do the first, as I have to get things done during my PhD :). So 
this leaves us with kind of a problem, the person mailing the list 
doesn't have the knowledge to ask the right question, the list can't 
answer properly and consequently, the person mailing the list still 
doesn't get the information he/she needs. We could start an R-beginner 
mailing list, but this would also suffer from this problem. What do you 
guys think?


Maybe the mailing list is not the right medium for really basic stuff. 
For that I would recommend a good R-book or (better) a course in R or 
(even better) some colleagues who work with R that you can ask questions to.


cheers,
Paul


Hope that's what you wanted
Ivan


Le 2/26/2010 08:39, Dieter Menne a écrit :


Patrick Burns wrote:
  

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?


 

(This derives partly from teaching)

The fact that this xapply-stuff was not idempotent (worse: not 
always) and

that you need a monster like do.call() to straighten this out. Nowadays,
plyr comes close.

The concept of environment. With S it was worse, though.

That you cannot change values passed by reference. I noted that the 
latter

is no problem for students who have not worked with c(++/#) before. That
there is only one return-result in functions.

[ and the likes as an operator.

10 years ago, when I started, the message was: S4 is the future, S3 is
legacy. So I learned S4. Only to never use is in self-written code 
later.

Might be different for BioConductor people.

That sometimes you can use vectors not in data= (lattice), and 
sometimes not

(ggplot2). Still a VERY confusing inconsistency.

The why-does-this-not-print FAQ.

Why does par(oma..) not work with lattice?

Dieter





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Ivan Calandra
You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such 
questions? And moreover, the beginners won't take advantage of the other 
questions (I've personally learned a lot trying to understand the 
questions and answers to other's problems). And also, as you said, the 
problems might persist.
The beginner's mailing list might be good in one aspect though: the 
experts who subscribe to it would be willing to help the beginners to 
get started with R, knowing that the questions might not be clearly stated.

As you pointed out, the mailing list is not the best for basic stuff 
(the question is of course what is basic?). Not everybody knows some 
colleagues who work with R (I'm personally the 1st one to use R in my lab).
I think, somehow and I have no idea how, documentation and guidance to 
search for help should be more accessible as soon as you start with R. 
Maybe a _*clear*_ section on the R homepage or in the introduction to 
R manual like where to find help, including all of the most common 
and useful resources available (from ? and RSiteSearch() to R Wiki and 
Crantastic).

I hope that this whole discussion might help to make the R world better.
Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :
 Ivan Calandra wrote:
 Since you want input from beginners, here are some thoughts

 I had and still have two big problems with R:
 - this vectorization thing. I've read many manuals (including R 
 inferno), but I'm still not completely clear about it. In simple 
 examples, it's fine. But when it gets a bit more complex, then...
 Related to it, the *apply functions are still a bit difficult to 
 understand. When I have to use them, I just try one and see what 
 happens. I don't understand them well enough to know which one I need.
 - the second problem is where to find the functions/packages I need. 
 There are many options, and that's actually the problem. R Wiki, 
 Rseek, RSiteSearch, Crantastic, etc... When you start with R, you 
 discover that the capabilities of R are almost unlimited and you 
 don't really know where to start, where to find what you need.

 As noted in earlier posts, the mailing list is really great, but some 
 people are really hard with beginners. It was noted in a discussion a 
 few days ago, but it looks like some don't realize how difficult it 
 is at the beginning to formulate a good question, clear, with 
 self-contained example and so on. Moreover, not everybody speaks 
 English natively. I don't mean that you must help, even when the 
 question is really vague and not clear and whatever. I'm just saying 
 that if you don't want to help (whatever the reason), you don't have 
 to say it badly. But in any cases, the mailing list is still really 
 helpful. As someone noted (sorry I erased the email so I don't 
 remember who), it might be a good idea to split it.
 Hi everyone,

 My 2ct about the mailing list :). I understand that beginners have a 
 hard time formulating a good question. But the problem is that we 
 can't answer the question when it is unclear. So either I:

 - Don't bother answering
 - Try do discuss with the author of the question, taking lots of time 
 to find out what exactly is the question.
 - Send a read the posting guide answer

 I mostly do the first, as I have to get things done during my PhD :). 
 So this leaves us with kind of a problem, the person mailing the list 
 doesn't have the knowledge to ask the right question, the list can't 
 answer properly and consequently, the person mailing the list still 
 doesn't get the information he/she needs. We could start an R-beginner 
 mailing list, but this would also suffer from this problem. What do 
 you guys think?

 Maybe the mailing list is not the right medium for really basic stuff. 
 For that I would recommend a good R-book or (better) a course in R or 
 (even better) some colleagues who work with R that you can ask 
 questions to.

 cheers,
 Paul

 Hope that's what you wanted
 Ivan


 Le 2/26/2010 08:39, Dieter Menne a écrit :

 Patrick Burns wrote:
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?


 (This derives partly from teaching)

 The fact that this xapply-stuff was not idempotent (worse: not 
 always) and
 that you need a monster like do.call() to straighten this out. 
 Nowadays,
 plyr comes close.

 The concept of environment. With S it was worse, though.

 That you cannot change values passed by reference. I noted that 
 the latter
 is no problem for students who have not worked with c(++/#) before. 
 That
 there is only one return-result in functions.

 [ and the likes as an operator.

 10 years ago, when I started, the message was: S4 is the future, S3 is
 legacy. So I learned S4. Only to never use is in self-written code 
 later.
 Might be different for BioConductor people.

 That sometimes you can 

Re: [R] two questions for R beginners

2010-02-26 Thread Allen S. Rout
Ivan Calandra ivan.calan...@uni-hamburg.de writes:

 Related to it, the *apply functions are still a bit difficult to
 understand. When I have to use them, I just try one and see what
 happens. I don't understand them well enough to know which one I
 need.


Ditto.  I have ended up with a small collection of black magic
invocations copied from other folks' code, designed to do things like

I wrote a function to read a file and generate a data frame.  Now I
want to iterate (vectorize) this over many files, and get a much
larger data frame.


This may be one specific case of the larger challenge of transforming
R data structures.  A somewhat pedantic set of recipes might usefully
be evolved on e.g. the wiki.



- Allen S. Rout

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Robert Baer
Honestly what I remember as the most difficult thing when I 'first'
started using R was figuring out how to read in my own datasets.  I
eventually discovered the R import/export manual, but somehow this alluded
me initially.  All the R tutorials I was working from simply generated
data or used the built in datasets, and I was ready to work on my own
datasets.

The things that led from frustration to independence was understanding
the difference between data types like matrix and dataframe and learning
there were commands to tell what you were working with at any given time.
Did the data read in as character, numeric, or factor, etc.  Commands
like: str, class, mode, ls, search, help, help.search, etc can help you
figure out what you are doing.

Rob




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Patrick Burns
Sent: Thursday, February 25, 2010 11:31 AM
To: r-help@r-project.org
Subject: [R] two questions for R beginners

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?

* What documents helped you the most in this
initial phase?

I especially want to hear from people who are
lazy and impatient.

Feel free to write to me off-list.  Definitely
write off-list if you are just confirming what
has been said on-list.

-- 
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Alain Guillet
I don't want to sound bad but the first thing beginners should do is to 
look at the manual An Introduction to R because most of the simple 
questions have their answers into it. In the same idea, before posting 
to this mailing list, people should (must?) follow the posting guide. 
Indeed it is written there to use some functions like help.search(), 
RSiteSearch() or to read An Introduction to R before posting. Too 
often I think how much I would like people to do their homeworks before 
posting.


I would like to add that I don't feel an R expert but I don't like to 
waste my time answering questions which have an answer you can find 
easily if you respect the posting guide.


Regards,
Alain


On 26-Feb-10 15:30, Ivan Calandra wrote:

You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such
questions? And moreover, the beginners won't take advantage of the other
questions (I've personally learned a lot trying to understand the
questions and answers to other's problems). And also, as you said, the
problems might persist.
The beginner's mailing list might be good in one aspect though: the
experts who subscribe to it would be willing to help the beginners to
get started with R, knowing that the questions might not be clearly stated.

As you pointed out, the mailing list is not the best for basic stuff
(the question is of course what is basic?). Not everybody knows some
colleagues who work with R (I'm personally the 1st one to use R in my lab).
I think, somehow and I have no idea how, documentation and guidance to
search for help should be more accessible as soon as you start with R.
Maybe a _*clear*_ section on the R homepage or in the introduction to
R manual like where to find help, including all of the most common
and useful resources available (from ? and RSiteSearch() to R Wiki and
Crantastic).

I hope that this whole discussion might help to make the R world better.
Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :
   

Ivan Calandra wrote:
 

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R
inferno), but I'm still not completely clear about it. In simple
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to
understand. When I have to use them, I just try one and see what
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I need.
There are many options, and that's actually the problem. R Wiki,
Rseek, RSiteSearch, Crantastic, etc... When you start with R, you
discover that the capabilities of R are almost unlimited and you
don't really know where to start, where to find what you need.

As noted in earlier posts, the mailing list is really great, but some
people are really hard with beginners. It was noted in a discussion a
few days ago, but it looks like some don't realize how difficult it
is at the beginning to formulate a good question, clear, with
self-contained example and so on. Moreover, not everybody speaks
English natively. I don't mean that you must help, even when the
question is really vague and not clear and whatever. I'm just saying
that if you don't want to help (whatever the reason), you don't have
to say it badly. But in any cases, the mailing list is still really
helpful. As someone noted (sorry I erased the email so I don't
remember who), it might be a good idea to split it.
   

Hi everyone,

My 2ct about the mailing list :). I understand that beginners have a
hard time formulating a good question. But the problem is that we
can't answer the question when it is unclear. So either I:

- Don't bother answering
- Try do discuss with the author of the question, taking lots of time
to find out what exactly is the question.
- Send a read the posting guide answer

I mostly do the first, as I have to get things done during my PhD :).
So this leaves us with kind of a problem, the person mailing the list
doesn't have the knowledge to ask the right question, the list can't
answer properly and consequently, the person mailing the list still
doesn't get the information he/she needs. We could start an R-beginner
mailing list, but this would also suffer from this problem. What do
you guys think?

Maybe the mailing list is not the right medium for really basic stuff.
For that I would recommend a good R-book or (better) a course in R or
(even better) some colleagues who work with R that you can ask
questions to.

cheers,
Paul
 

Hope that's what you wanted
Ivan


Le 2/26/2010 08:39, Dieter Menne a écrit :
   

Patrick Burns wrote:
 

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?


   

(This derives 

Re: [R] two questions for R beginners

2010-02-26 Thread Paul Hiemstra

Ivan Calandra wrote:

You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such 
questions? And moreover, the beginners won't take advantage of the other 
questions (I've personally learned a lot trying to understand the 
questions and answers to other's problems). And also, as you said, the 
problems might persist.
The beginner's mailing list might be good in one aspect though: the 
experts who subscribe to it would be willing to help the beginners to 
get started with R, knowing that the questions might not be clearly stated.


As you pointed out, the mailing list is not the best for basic stuff 
(the question is of course what is basic?). Not everybody knows some 
colleagues who work with R (I'm personally the 1st one to use R in my lab).
I think, somehow and I have no idea how, documentation and guidance to 
search for help should be more accessible as soon as you start with R. 
Maybe a _*clear*_ section on the R homepage or in the introduction to 
R manual like where to find help, including all of the most common 
and useful resources available (from ? and RSiteSearch() to R Wiki and 
Crantastic).
  

Hi Ivan (and list),

I think the main problem is not as much that there isn't structure in 
the way R provides documentation / tutorials, but that people have a 
hard time finding the structure. There are task views for certain 
specific fields, but I think a lot of beginners do not know that they 
exist. There are separate mailing lists for specific fields, but I often 
see geographical (my field of expertise) oriented questions on R-help 
that would fit much better on R-sig-geo.


So I think a O my God, I've downloaded R and what now tutorial might 
be a good idea to put very close to the download button of R on CRAN. 
This tutorial would focus not on how to do things in R, but would 
provide guidance to the most obvious sources of information such as Task 
views, specific mailing lists, ways to search list archives, information 
for beginners how to write a good e-mail etc. I think for a lot of 
beginners it is not as much the answer to a specific question that they 
need, but more guidance how to look for answers themselves.


But at the end of the day, R is still not very easy to learn when coming 
from GUI oriented stats programs. In addition, to become reasonably 
fluent in R, you need spend at least a few hours a week on it. SO I 
think we can ease the pain for beginners, but not take away that it 
takes quite some time to become fluent in R.


cheers,
Paul

I hope that this whole discussion might help to make the R world better.
Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :
  

Ivan Calandra wrote:


Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R 
inferno), but I'm still not completely clear about it. In simple 
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to 
understand. When I have to use them, I just try one and see what 
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I need. 
There are many options, and that's actually the problem. R Wiki, 
Rseek, RSiteSearch, Crantastic, etc... When you start with R, you 
discover that the capabilities of R are almost unlimited and you 
don't really know where to start, where to find what you need.


As noted in earlier posts, the mailing list is really great, but some 
people are really hard with beginners. It was noted in a discussion a 
few days ago, but it looks like some don't realize how difficult it 
is at the beginning to formulate a good question, clear, with 
self-contained example and so on. Moreover, not everybody speaks 
English natively. I don't mean that you must help, even when the 
question is really vague and not clear and whatever. I'm just saying 
that if you don't want to help (whatever the reason), you don't have 
to say it badly. But in any cases, the mailing list is still really 
helpful. As someone noted (sorry I erased the email so I don't 
remember who), it might be a good idea to split it.
  

Hi everyone,

My 2ct about the mailing list :). I understand that beginners have a 
hard time formulating a good question. But the problem is that we 
can't answer the question when it is unclear. So either I:


- Don't bother answering
- Try do discuss with the author of the question, taking lots of time 
to find out what exactly is the question.

- Send a read the posting guide answer

I mostly do the first, as I have to get things done during my PhD :). 
So this leaves us with kind of a problem, the person mailing the list 
doesn't have the knowledge to ask the right question, the list can't 

Re: [R] two questions for R beginners

2010-02-26 Thread Ivan Calandra

Hi again Paul,


Hi Ivan (and list),

I think the main problem is not as much that there isn't structure in 
the way R provides documentation / tutorials, but that people have a 
hard time finding the structure. There are task views for certain 
specific fields, but I think a lot of beginners do not know that they 
exist. 


You're definitely right... what is it?! where to find them?

So I think a O my God, I've downloaded R and what now tutorial might 
be a good idea to put very close to the download button of R on CRAN. 
This tutorial would focus not on how to do things in R, but would 
provide guidance to the most obvious sources of information such as 
Task views, specific mailing lists, ways to search list archives, 
information for beginners how to write a good e-mail etc. I think for 
a lot of beginners it is not as much the answer to a specific question 
that they need, but more guidance how to look for answers themselves.


I think that would indeed help a lot. I can only agree with your last 
sentence. Is someone already working on this kind of manual? Is it 
planed? etc?



cheers,
Paul

Regards,
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Thomas Adams

Paul,

I think your point you need [to] spend at least a few hours a week on 
it is key. Since I am not doing statistics daily, more in fits  starts 
as my latest project -may- require, my approach has been more task 
oriented. A less-than-ideal approach. So, I think your suggestion is 
on-the-mark.


Tom


Paul Hiemstra wrote:

Ivan Calandra wrote:

You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such 
questions? And moreover, the beginners won't take advantage of the 
other questions (I've personally learned a lot trying to understand 
the questions and answers to other's problems). And also, as you 
said, the problems might persist.
The beginner's mailing list might be good in one aspect though: the 
experts who subscribe to it would be willing to help the beginners 
to get started with R, knowing that the questions might not be 
clearly stated.


As you pointed out, the mailing list is not the best for basic stuff 
(the question is of course what is basic?). Not everybody knows 
some colleagues who work with R (I'm personally the 1st one to use R 
in my lab).
I think, somehow and I have no idea how, documentation and guidance 
to search for help should be more accessible as soon as you start 
with R. Maybe a _*clear*_ section on the R homepage or in the 
introduction to R manual like where to find help, including all 
of the most common and useful resources available (from ? and 
RSiteSearch() to R Wiki and Crantastic).
  

Hi Ivan (and list),

I think the main problem is not as much that there isn't structure in 
the way R provides documentation / tutorials, but that people have a 
hard time finding the structure. There are task views for certain 
specific fields, but I think a lot of beginners do not know that they 
exist. There are separate mailing lists for specific fields, but I 
often see geographical (my field of expertise) oriented questions on 
R-help that would fit much better on R-sig-geo.


So I think a O my God, I've downloaded R and what now tutorial might 
be a good idea to put very close to the download button of R on CRAN. 
This tutorial would focus not on how to do things in R, but would 
provide guidance to the most obvious sources of information such as 
Task views, specific mailing lists, ways to search list archives, 
information for beginners how to write a good e-mail etc. I think for 
a lot of beginners it is not as much the answer to a specific question 
that they need, but more guidance how to look for answers themselves.


But at the end of the day, R is still not very easy to learn when 
coming from GUI oriented stats programs. In addition, to become 
reasonably fluent in R, you need spend at least a few hours a week on 
it. SO I think we can ease the pain for beginners, but not take away 
that it takes quite some time to become fluent in R.


cheers,
Paul

I hope that this whole discussion might help to make the R world better.
Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :
 

Ivan Calandra wrote:
   

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R 
inferno), but I'm still not completely clear about it. In simple 
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to 
understand. When I have to use them, I just try one and see what 
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I 
need. There are many options, and that's actually the problem. R 
Wiki, Rseek, RSiteSearch, Crantastic, etc... When you start with R, 
you discover that the capabilities of R are almost unlimited and 
you don't really know where to start, where to find what you need.


As noted in earlier posts, the mailing list is really great, but 
some people are really hard with beginners. It was noted in a 
discussion a few days ago, but it looks like some don't realize how 
difficult it is at the beginning to formulate a good question, 
clear, with self-contained example and so on. Moreover, not 
everybody speaks English natively. I don't mean that you must help, 
even when the question is really vague and not clear and whatever. 
I'm just saying that if you don't want to help (whatever the 
reason), you don't have to say it badly. But in any cases, the 
mailing list is still really helpful. As someone noted (sorry I 
erased the email so I don't remember who), it might be a good idea 
to split it.
  

Hi everyone,

My 2ct about the mailing list :). I understand that beginners have a 
hard time formulating a good question. But the problem is that we 
can't answer the question when it is unclear. So either I:


- Don't bother answering
- Try do discuss with the author of 

Re: [R] two questions for R beginners

2010-02-26 Thread Paul Hiemstra

Thomas Adams wrote:

Paul,

I think your point you need [to] spend at least a few hours a week on 
it is key. Since I am not doing statistics daily, more in fits  
starts as my latest project -may- require, my approach has been more 
task oriented. A less-than-ideal approach. So, I think your suggestion 
is on-the-mark.


Tom
I also see co-workers who would like to work with R, see the benefit of 
R etc, but don't have the time to learn and maintain R. But I'm not 
really sure how to fix this, it seems impossible to have both easy, 
intuitive  to use and power and flexibility.


cheers,
Paul



Paul Hiemstra wrote:

Ivan Calandra wrote:

You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such 
questions? And moreover, the beginners won't take advantage of the 
other questions (I've personally learned a lot trying to understand 
the questions and answers to other's problems). And also, as you 
said, the problems might persist.
The beginner's mailing list might be good in one aspect though: the 
experts who subscribe to it would be willing to help the beginners 
to get started with R, knowing that the questions might not be 
clearly stated.


As you pointed out, the mailing list is not the best for basic stuff 
(the question is of course what is basic?). Not everybody knows 
some colleagues who work with R (I'm personally the 1st one to use R 
in my lab).
I think, somehow and I have no idea how, documentation and guidance 
to search for help should be more accessible as soon as you start 
with R. Maybe a _*clear*_ section on the R homepage or in the 
introduction to R manual like where to find help, including all 
of the most common and useful resources available (from ? and 
RSiteSearch() to R Wiki and Crantastic).
  

Hi Ivan (and list),

I think the main problem is not as much that there isn't structure in 
the way R provides documentation / tutorials, but that people have a 
hard time finding the structure. There are task views for certain 
specific fields, but I think a lot of beginners do not know that they 
exist. There are separate mailing lists for specific fields, but I 
often see geographical (my field of expertise) oriented questions on 
R-help that would fit much better on R-sig-geo.


So I think a O my God, I've downloaded R and what now tutorial 
might be a good idea to put very close to the download button of R on 
CRAN. This tutorial would focus not on how to do things in R, but 
would provide guidance to the most obvious sources of information 
such as Task views, specific mailing lists, ways to search list 
archives, information for beginners how to write a good e-mail etc. I 
think for a lot of beginners it is not as much the answer to a 
specific question that they need, but more guidance how to look for 
answers themselves.


But at the end of the day, R is still not very easy to learn when 
coming from GUI oriented stats programs. In addition, to become 
reasonably fluent in R, you need spend at least a few hours a week on 
it. SO I think we can ease the pain for beginners, but not take away 
that it takes quite some time to become fluent in R.


cheers,
Paul
I hope that this whole discussion might help to make the R world 
better.

Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :
 

Ivan Calandra wrote:
  

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R 
inferno), but I'm still not completely clear about it. In simple 
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to 
understand. When I have to use them, I just try one and see what 
happens. I don't understand them well enough to know which one I 
need.
- the second problem is where to find the functions/packages I 
need. There are many options, and that's actually the problem. R 
Wiki, Rseek, RSiteSearch, Crantastic, etc... When you start with 
R, you discover that the capabilities of R are almost unlimited 
and you don't really know where to start, where to find what you 
need.


As noted in earlier posts, the mailing list is really great, but 
some people are really hard with beginners. It was noted in a 
discussion a few days ago, but it looks like some don't realize 
how difficult it is at the beginning to formulate a good question, 
clear, with self-contained example and so on. Moreover, not 
everybody speaks English natively. I don't mean that you must 
help, even when the question is really vague and not clear and 
whatever. I'm just saying that if you don't want to help (whatever 
the reason), you don't have to say it badly. But in any cases, the 
mailing list is still really helpful. As someone noted (sorry I 
erased the email so I don't remember who), it might be a good 

Re: [R] two questions for R beginners

2010-02-26 Thread Claudia Beleites

Dear Patrick (and all)

I'm now working with R a couple of years, before working mostly in Matlab
Lazy  impatient is both true for me :-)


* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?


 * What documents helped you the most in this
 initial phase?

 I especially want to hear from people who are
 lazy and impatient.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.


Stumbling:

* It took me long to remember
getwd () and setwd () (instead of pwd and cd / chdir or the like)

* I still discover very useful functions that I would have needed for a long 
time. Latest discoveries: mapply and ave
I knew aggregate. And was always a little angry that it needs a grouping list. I 
even decided that the aggregate method for my hyperSpec class should work with 
factors as well as with lists. Some day I read in this mailing list that ave 
does what I need...
I like the crosslinks in the help (see also) very much. Maybe I rely too much on 
them. So: not lazy today, I attach a patch for aggregate.Rd that adds the 
seealso to ave.


Reading this mailing list once in a while gives me nice new ideas. However,  50 
emails / d is somewhat scary for me, so I read only occasionally.


* Vecorization: I like the *apply functions.
but I'd really appreciate a comprehensive page/vignette here.
I remember that it took me a while to realize that the rule for MARGIN in sweep 
is use the same number as in the apply that created the STATS


* I never found the pdf manuals helpful (help pages are easier to access, and 
there is nothing in the pdf that the help doesn't have.

At the beginning I expected the pdf manual to be something that the vignettes 
are.

* I did not arrive at a comfortable debugging cycle for a long time. But now 
there's the debug package and setBreakpoint and I'm happy


* As I now start teaching I notice that many students react to error messages 
uhh! an error! (panic). Few realizing that the error message actually gives 
information on what went wrong.
A list with common causes of different error messages would be helpful here, I 
think.
In case someone agrees: I started one at the Wiki: 
http://rwiki.sciviews.org/doku.php?id=tips:errormessages



Cheers,

Claudia



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Saeed Abu Nimeh
Pat,
Off the bat, beginners and advanced. In addition, splitting by domain
would be very helpful -- something along the lines of:
http://cran.r-project.org/web/views/. But we should be careful, we do
not want to create 20 other mailing lists :) We have to group things.
This will help splitting the volume of the list and will help in
targeting lists by expertise.
Thanks,
Saeed

On Fri, Feb 26, 2010 at 2:08 AM, Patrick Burns pbu...@pburns.seanet.com wrote:
 Saeed,

 If the R-help list were split, what do you
 see as the pieces?

 Pat

 On 26/02/2010 01:53, Saeed Abu Nimeh wrote:

 On Thu, Feb 25, 2010 at 9:31 AM, Patrick Burnspbu...@pburns.seanet.com
  wrote:

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 1- Compared to other programming languages it is hard to learn R by
 example, because it is hard to find code on the web that will do the
 exact thing you are looking for, sometimes you might get lucky though.
 By contrast, take Perl for example, it is an easy language to learn by
 example.

 2- The R mailing list. Beginners get frustrated after they struggle
 for a long time to solve a problem and the easiest thing then is to
 send an email to the R mailing list. I did this in the past. The best
 thing that happened was that my request was neglected and I had to
 spend more time on the problem and find a solution by myself
 eventually. Do not get me wrong, I am not saying that the mailing list
 is bad, but it should be more organized. Maybe broken down into couple
 of other mailing lists. This might bring up a good discussion thread.


 * What documents helped you the most in this
 initial phase?

 An Introduction to R by Venables
 simpleR – Using R for Introductory Statistics by Verzani


 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Seeliger . Curt
Patrick Burns pbu...@pburns.seanet.com
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

I came into R from SAS, with its powerful data step language and very 
simplified data types.  Most of my work is data manipulation prior to a 
variety of univariate statistical calculations.  The vector-based nature 
of R, and thus the variety of indexing schemes used, was a big conceptual 
hurdle. 

The often unhelpful attitude of several list respondents, while not unique 
to this list, was and continues to be another block to advancement.  This 
does not occur on the list for SAS, in which asking 'dumb' questions is 
generally supported as an inevitable part of learning.  Having aggregate() 
pointed out to me by one kind soul, hidden amidst the assortment of 
by()/apply() functions, became the basis for much success.

I am currently trying to wrap my mind around how missing values are 
handled; the defaults are quite different than SAS, and mostly in a good 
way.  However the handling of NA values in a slicing statements does not 
seem quite proper, even if it is addressed in the R documents.
aa - data.frame('id'=letters[1:5], 'x'=1:5, 
stringsAsFactors=FALSE)
aa[aa$x == 3,]$x - NA
aa[aa$x == '4',]# 2 rows instead of 1.
aa[aa$x %in% '4',]  # 1 row as expected.

I am also looking for concise methods for building up dataframes for our 
unit tests.  While there are several ways to accomplish this, depending on 
what is needed, none are elegant though expand.grid() comes close.

next: The R inferno.  I *will* understand more than the first few pages. 
And all those apply()-ish functions, as I'm already good friends with 
aggregate().

 * What documents helped you the most in this
 initial phase?

RSeek.org was and continues to be a big source of help. I've looked at 
several texts aimed at beginners, and all provided simple examples that 
were useful.  The most consistent source of instruction has been to make 
up my own small projects that were either fun or slightly relevant to my 
job.  The ability to make up toy problems, or simplify a complex process 
have been unexpectedly important skills.  Developing unit tests for 
functions, initially seen as an irritant by some, has become an important 
tool for honing our advances.

 I especially want to hear from people who are
 lazy and impatient.

And, I hope, incompetent.  I've found incompetence to be as professionally 
important as hubris.  I wouldn't want one without the other.

cur

-- 
Curt Seeliger, Data Ranger
Raytheon Information Services - Contractor to ORD
seeliger.c...@epa.gov
541/754-4638


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Jack Siegrist

My biggest impediment, as a scientist without previous programming
experience, is that the R help is not beginner-friendly. I think it is
probably great for experienced programmers and for the people who helped to
create the software, to help them  remember what they did, but I think it is
very difficult for a newcomer without a strong programming background to
learn about a new function or to discover the name of a function that you
are pretty sure should already exist. Maybe this wouldn’t matter for most
programming languages, but as free statistics software R is obviously going
to attract many scientists who want to get an analysis done and have varying
levels of experience with programming. 

I found it much easier to learn how to use Mathematica, using only the
online help. With R I had to buy several books to get a handle on it, which
is fine, but even the books that I have found to be most useful tend to be
didactically lacking—either too cursory or mired in unexplained programming
jargon. They are OK just not great.

What I think would be very helpful is an introduction to programming using
R, preferably a big thick college textbook that takes at least a semester to
go through, which should be a prerequisite for going through the
Introduction to R available on CRAN.

Also to do any analysis on real data you have to use the apply family of
functions to perform different functions by groups. A long introduction to
these functions, with lots of comparisons and contrasts between them would
be very helpful.

A few random examples concerning the R help: 

In my version of R (2.7.0 on Windows XP) typing
 ?+
doesn’t do anything, but then if you type in the next line
+ ?sum
you get the “Arithmetic Operators” help page.
If you had just typed
 ?sum
in the first place you get the “Sum of Vector Elements” help page. 

Most examples in the R help pages use way to many other functions to be
useful to a beginner. If an example uses 10 other functions besides the one
being described, chances are a beginner won’t know what one of them does,
which can set off a chain of having to look up other irrelevant functions.

Some function names in the base package are goofy, such as “rowsum” which is
used to “compute column sums across rows”, not to be confused with “rowSums”
which computes row sums.

-- 
View this message in context: 
http://n4.nabble.com/two-questions-for-R-beginners-tp1569384p1571243.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Saeed Abu Nimeh

Hi Ivan,

On 2/26/10 6:30 AM, Ivan Calandra wrote:

You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such
questions?


If I subscribe to the beginners mailing list, then I have to expect 
novice questions and I should be willing to help. Otherwise, I should 
not be there.


And moreover, the beginners won't take advantage of the other

questions (I've personally learned a lot trying to understand the
questions and answers to other's problems).


They can still subscribe to the advanced, but they will know that they 
are here to observe and learn, not to ask novice questions. You want to 
ask basic stuff, go to the beginners list :)


Not sure if you guys have been on some of the linux mailing lists out 
there, but man let me tell you, some of these lists have a RTFM attitude 
and they will fry you if you ask novice questions. Frankly, that is 
understandable, as most of the members are geeks and they have higher 
expectations. This mailing list is different, I have seen posts from 
different disciplines; biology, biostats, stats, computer science, 
oceanography, etc. So, IMO, there should be a beginners list to cope 
with such broad committee.


Thanks,
Saeed

And also, as you said, the

problems might persist.
The beginner's mailing list might be good in one aspect though: the
experts who subscribe to it would be willing to help the beginners to
get started with R, knowing that the questions might not be clearly stated.

As you pointed out, the mailing list is not the best for basic stuff
(the question is of course what is basic?). Not everybody knows some
colleagues who work with R (I'm personally the 1st one to use R in my lab).
I think, somehow and I have no idea how, documentation and guidance to
search for help should be more accessible as soon as you start with R.
Maybe a _*clear*_ section on the R homepage or in the introduction to
R manual like where to find help, including all of the most common
and useful resources available (from ? and RSiteSearch() to R Wiki and
Crantastic).

I hope that this whole discussion might help to make the R world better.
Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :

Ivan Calandra wrote:

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R
inferno), but I'm still not completely clear about it. In simple
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to
understand. When I have to use them, I just try one and see what
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I need.
There are many options, and that's actually the problem. R Wiki,
Rseek, RSiteSearch, Crantastic, etc... When you start with R, you
discover that the capabilities of R are almost unlimited and you
don't really know where to start, where to find what you need.

As noted in earlier posts, the mailing list is really great, but some
people are really hard with beginners. It was noted in a discussion a
few days ago, but it looks like some don't realize how difficult it
is at the beginning to formulate a good question, clear, with
self-contained example and so on. Moreover, not everybody speaks
English natively. I don't mean that you must help, even when the
question is really vague and not clear and whatever. I'm just saying
that if you don't want to help (whatever the reason), you don't have
to say it badly. But in any cases, the mailing list is still really
helpful. As someone noted (sorry I erased the email so I don't
remember who), it might be a good idea to split it.

Hi everyone,

My 2ct about the mailing list :). I understand that beginners have a
hard time formulating a good question. But the problem is that we
can't answer the question when it is unclear. So either I:

- Don't bother answering
- Try do discuss with the author of the question, taking lots of time
to find out what exactly is the question.
- Send a read the posting guide answer

I mostly do the first, as I have to get things done during my PhD :).
So this leaves us with kind of a problem, the person mailing the list
doesn't have the knowledge to ask the right question, the list can't
answer properly and consequently, the person mailing the list still
doesn't get the information he/she needs. We could start an R-beginner
mailing list, but this would also suffer from this problem. What do
you guys think?

Maybe the mailing list is not the right medium for really basic stuff.
For that I would recommend a good R-book or (better) a course in R or
(even better) some colleagues who work with R that you can ask
questions to.

cheers,
Paul


Hope that's what you wanted
Ivan


Le 2/26/2010 08:39, Dieter Menne a 

Re: [R] two questions for R beginners

2010-02-26 Thread Saeed Abu Nimeh

sorry meant community not committee

On 2/26/10 8:36 PM, Saeed Abu Nimeh wrote:

Hi Ivan,

On 2/26/10 6:30 AM, Ivan Calandra wrote:

You are definitely right...
What to do with bad beginner's questions is not a simple issue.

If a beginner's mailing list is created, who will answer to such
questions?


If I subscribe to the beginners mailing list, then I have to expect
novice questions and I should be willing to help. Otherwise, I should
not be there.

And moreover, the beginners won't take advantage of the other

questions (I've personally learned a lot trying to understand the
questions and answers to other's problems).


They can still subscribe to the advanced, but they will know that they
are here to observe and learn, not to ask novice questions. You want to
ask basic stuff, go to the beginners list :)

Not sure if you guys have been on some of the linux mailing lists out
there, but man let me tell you, some of these lists have a RTFM attitude
and they will fry you if you ask novice questions. Frankly, that is
understandable, as most of the members are geeks and they have higher
expectations. This mailing list is different, I have seen posts from
different disciplines; biology, biostats, stats, computer science,
oceanography, etc. So, IMO, there should be a beginners list to cope
with such broad committee.

Thanks,
Saeed

And also, as you said, the

problems might persist.
The beginner's mailing list might be good in one aspect though: the
experts who subscribe to it would be willing to help the beginners to
get started with R, knowing that the questions might not be clearly
stated.

As you pointed out, the mailing list is not the best for basic stuff
(the question is of course what is basic?). Not everybody knows some
colleagues who work with R (I'm personally the 1st one to use R in my
lab).
I think, somehow and I have no idea how, documentation and guidance to
search for help should be more accessible as soon as you start with R.
Maybe a _*clear*_ section on the R homepage or in the introduction to
R manual like where to find help, including all of the most common
and useful resources available (from ? and RSiteSearch() to R Wiki and
Crantastic).

I hope that this whole discussion might help to make the R world better.
Thank you Patrick for initiating it!
Regards,
Ivan

Le 2/26/2010 15:09, Paul Hiemstra a écrit :

Ivan Calandra wrote:

Since you want input from beginners, here are some thoughts

I had and still have two big problems with R:
- this vectorization thing. I've read many manuals (including R
inferno), but I'm still not completely clear about it. In simple
examples, it's fine. But when it gets a bit more complex, then...
Related to it, the *apply functions are still a bit difficult to
understand. When I have to use them, I just try one and see what
happens. I don't understand them well enough to know which one I need.
- the second problem is where to find the functions/packages I need.
There are many options, and that's actually the problem. R Wiki,
Rseek, RSiteSearch, Crantastic, etc... When you start with R, you
discover that the capabilities of R are almost unlimited and you
don't really know where to start, where to find what you need.

As noted in earlier posts, the mailing list is really great, but some
people are really hard with beginners. It was noted in a discussion a
few days ago, but it looks like some don't realize how difficult it
is at the beginning to formulate a good question, clear, with
self-contained example and so on. Moreover, not everybody speaks
English natively. I don't mean that you must help, even when the
question is really vague and not clear and whatever. I'm just saying
that if you don't want to help (whatever the reason), you don't have
to say it badly. But in any cases, the mailing list is still really
helpful. As someone noted (sorry I erased the email so I don't
remember who), it might be a good idea to split it.

Hi everyone,

My 2ct about the mailing list :). I understand that beginners have a
hard time formulating a good question. But the problem is that we
can't answer the question when it is unclear. So either I:

- Don't bother answering
- Try do discuss with the author of the question, taking lots of time
to find out what exactly is the question.
- Send a read the posting guide answer

I mostly do the first, as I have to get things done during my PhD :).
So this leaves us with kind of a problem, the person mailing the list
doesn't have the knowledge to ask the right question, the list can't
answer properly and consequently, the person mailing the list still
doesn't get the information he/she needs. We could start an R-beginner
mailing list, but this would also suffer from this problem. What do
you guys think?

Maybe the mailing list is not the right medium for really basic stuff.
For that I would recommend a good R-book or (better) a course in R or
(even better) some colleagues who work with R that you can ask
questions to.

cheers,
Paul



Re: [R] two questions for R beginners

2010-02-26 Thread Gabor Grothendieck
On Fri, Feb 26, 2010 at 1:28 PM, Saeed Abu Nimeh sabun...@gmail.com wrote:
 Pat,
 Off the bat, beginners and advanced. In addition, splitting by domain
 would be very helpful -- something along the lines of:
 http://cran.r-project.org/web/views/. But we should be careful, we do
 not want to create 20 other mailing lists :) We have to group things.

Note that there are already 24 mailing lists here:
http://www.r-project.org/mail.html

 This will help splitting the volume of the list and will help in
 targeting lists by expertise.
 Thanks,
 Saeed

 On Fri, Feb 26, 2010 at 2:08 AM, Patrick Burns pbu...@pburns.seanet.com 
 wrote:
 Saeed,

 If the R-help list were split, what do you
 see as the pieces?

 Pat

 On 26/02/2010 01:53, Saeed Abu Nimeh wrote:

 On Thu, Feb 25, 2010 at 9:31 AM, Patrick Burnspbu...@pburns.seanet.com
  wrote:

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 1- Compared to other programming languages it is hard to learn R by
 example, because it is hard to find code on the web that will do the
 exact thing you are looking for, sometimes you might get lucky though.
 By contrast, take Perl for example, it is an easy language to learn by
 example.

 2- The R mailing list. Beginners get frustrated after they struggle
 for a long time to solve a problem and the easiest thing then is to
 send an email to the R mailing list. I did this in the past. The best
 thing that happened was that my request was neglected and I had to
 spend more time on the problem and find a solution by myself
 eventually. Do not get me wrong, I am not saying that the mailing list
 is bad, but it should be more organized. Maybe broken down into couple
 of other mailing lists. This might bring up a good discussion thread.


 * What documents helped you the most in this
 initial phase?

 An Introduction to R by Venables
 simpleR – Using R for Introductory Statistics by Verzani


 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread GlenB


Lazy and impatient? That's me! 

I find it hard to say what my biggest misconceptions were.

Here's one thing:

What I realized very early on:

 - many data analysis functions return a bunch of stuff, not all of which
you see when you print() it

what I *failed* to realize:

- The bunch of stuff such functions return is just a *list* 

that has follow-on implications:

- even if you're just doing some simple analysis like a linear regression,
if you want to be able to see/get all the information, you really need to
learn how to examine what's in a list and how to operate on the list.

I had seen lists as potentially useful but not something I need to worry
about right now, since I'm having enough trouble just grokking why
dataframes look different to matrices, whereas I needed to know that lists
were absolutely central to what I was trying to achieve.

While I have no doubt this information can be found in a dozen places, I
read a bunch of introductory documents at the time, and I don't recall it
being stated explicitly like that in any of the places I looked. It made a
big difference to me when I realized that so many functions just return a
list. I mean, it's obvious, and I should have seen that's all it was the
first time, but I didn't.

-- 
View this message in context: 
http://n4.nabble.com/two-questions-for-R-beginners-tp1569384p1571715.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Clint Bowman
I started using statistical software with the commercial product S+ 
when I obtained a new HP735 workstation.  We kept the S+ license 
going for a number of years until I heard about R.  It was an easy 
transition and because I have been proficient in fortran and perl, 
the scripting came naturally--except for some syntax 
similarities/differences between perl and R interacting with a 
natural tendency towards dyslexia.


I especially like that I can slice and dice the data to ferret out 
relationships e.g., concentration by hour of day, by month, by wind 
speed, by wind direction--love those boxplots.


I also find that even the default settings produce some pretty 
attractive plots that are useable in many settings--I've also 
produced some pretty awful ones.


And the price always reminds me that I need to find every way 
possible to contribute to the overall good--I've forgotten too much 
of my fortran and C programming skills to contribute directly to 
the R Project.


Clint

--
Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

On Thu, 25 Feb 2010, Ralf B wrote:


My biggest blocker was my misconception that R is extremely difficult
to start with. It is powerful and one can do very complicated things (
that consequently turn things  complicated) but it comes with very
nice defaults and one can produce great results with standard tasks in
very little time - especially if one has done programming and/or
scripting before.

I pushed it away for too long that way. I wish I would have used it
years ago and avoided SPSS altogether - must have wasted 100s of hours
doing repetitive tasks by click and partial scripts in SPSS. Not to
mention a horrible license policy and a visualization unit that is
simply embarrassing for a product that is in its 18th or 19th version.

Ralf

On Thu, Feb 25, 2010 at 1:11 PM, Tal Galili tal.gal...@gmail.com wrote:

My biggest stumbling blocks to getting up and running with R was whenever I
was lazy and impatient.

The more you love R, the more it loves you back.

Tal




Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Thu, Feb 25, 2010 at 7:31 PM, Patrick Burns pbu...@pburns.seanet.comwrote:


* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?

* What documents helped you the most in this
initial phase?

I especially want to hear from people who are
lazy and impatient.

Feel free to write to me off-list.  Definitely
write off-list if you are just confirming what
has been said on-list.

--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



       [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Patrick Burns

Apparently I need to explain the lazy and
impatient comment.  No offence was intended
(quite the contrary).  The meaning of it is
that the higher your level of frustration,
the more valuable your comments are likely to
be to me.

On 25/02/2010 17:31, Patrick Burns wrote:

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?

* What documents helped you the most in this
initial phase?

I especially want to hear from people who are
lazy and impatient.

Feel free to write to me off-list. Definitely
write off-list if you are just confirming what
has been said on-list.



--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Liviu Andronic
On 2/25/10, Patrick Burns pbu...@pburns.seanet.com wrote:
 * What were your biggest misconceptions or
  stumbling blocks to getting up and running
  with R?

  * What documents helped you the most in this
  initial phase?

  I especially want to hear from people who are
  lazy and impatient.

I'm quite resilient so I don't think I got to the point of
frustration, but getting up to speed was a lengthy process. The
biggest stumbler was getting onto the console, and not knowing what to
do next. (My first encounter with stats was SPSS, so it was similar to
getting onto a UNIX virtual console after a life-long experience with
point-and-click windows: it's not very reassuring to know that there
are man pages.) I stayed in the what-do-I-do-next state of mind for
about 6-12 months (I learned R myself, and my professors were quite
reticent when I first introduced them to R).

Of particular help to making progress were JGR (arguments suggestions,
editor with syntax highlighting, object browser, etc.), Rcmdr (quick
access to examples for performing specific tasks, etc.) and Sweave +
LyX (for easy results transfer and report creation, without the burden
of learning LaTeX). For graphics, playwith latticist and rggobi come
in very handy. From the documentation, right now I can recall Quick-R
and R for SAS and SPSS users. And of course, RSiteSearch (also via
the sos package), Rseek and the vignettes are a must.

Regards
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Greg Snow
Patrick,

I would add one more question:

* where did you look for help expecting answers, but did not find them?

If you add hubris to laziness and impatience, you have Larry Wall's 3 virtues 
of a programmer.

To new users of R who may not understand why Patrick is asking:

Patrick Burns is the author of some great tutorials/references on S/R and is 
probably looking for questions to answer in his next contribution.

Lately there have been a large number of questions on some fairly basic issues 
(and some rather complex issues that people expected to be simple/basic).  My 
initial response (and probably others as well) to some of these requests was to 
quickly think that the answer is obvious and that the obvious place to look is 
..., but then I realize that I am a high school dropout who has been using S/R 
for over 20 years, majored in statistics but reads Shakespeare for fun, and 
have been known to saw people in half for the entertainment of others; so I am 
probably not representative of most beginners.  Fortune(89) probably applies 
here.  If R beginners will share their frustrations, where they looked but did 
not find answers (and why they looked there), what would have helped them, etc. 
 Then we (well probably Patrick mostly) can do more to help the next set of 
beginners.

It does not matter how good our answers are if they answer the wrong questions 
or are in places that the questioner never sees them.

The best way to spread information is to tell someone that it is a secret, the 
best way to keep it secret is to put it in a manual.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Patrick Burns
 Sent: Thursday, February 25, 2010 10:31 AM
 To: r-help@r-project.org
 Subject: [R] two questions for R beginners
 
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?
 
 * What documents helped you the most in this
 initial phase?
 
 I especially want to hear from people who are
 lazy and impatient.
 
 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.
 
 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Peter Dalgaard

Patrick Burns wrote:

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?

* What documents helped you the most in this
initial phase?

I especially want to hear from people who are
lazy and impatient.


Can't be bothered with questionnaires and can't wait to see your next 
book... ;-)


-pd


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Carl Witthoft

Well, here goes...

I still wish there were a really good monograph on the use and 
implementation of factors.


I had to do a certain amount of digging to learn that {assign, get, 
eval, expression, call, parse, deparse} all existed and how they play 
together.  Sometimes they are look like the C language's  indirect 
addressing, *foo and foo , and sometimes they don't. :-)


Remembering exactly what  y~x  can do and what it can't took a while.

Learning about, and watching for 'lazy evaluation,' especially in 
variables passed to a function, was a bit of a surprise.


And to echo others, R-inferno has been invaluable, along with the 
Zoonek manual.


Carl

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Albert-Jan Roskam
 The best way to spread information is to tell someone that it is a secret, 
 the best way to keep it secret is to put it in   a manual.

== Nice quote. ;-) The problem is not that there's too little information, 
rather there's so much. That's probably because R is so powerful, but it makes 
it tough to sieve out the relevant bits. Some of the info is way too technical 
to be practical. If I want to drive a car I do not necessarily need to know all 
the nitty gritty about engine technology.

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

== That R can't deal very well with large data, which is not entirely untrue. 
Also, I was learning another language (Python) and I didn't want R to interfere 
with that. Finally, in a working
 environment, it;s almost impossible to justify the time 'lost' learning a new 
language. Managers generally don't give a %$# about the beauty and robustness 
of a language. They just want to get the job done asap.


 * What documents helped you the most in this
 initial phase?

== Many docs. CRAN documents (pdfs), other tutorials, Bob Muenchen's book. 
Many docs == many angles == a good way to learn things.

 I especially want to hear from people who are
 lazy and impatient.

== Lazy? n/a. Impatient? Yup, guilty as charged.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.

Cheers!!

Albert-Jan



~~

In the face of ambiguity, refuse the temptation to guess.

~~

--- On Thu, 2/25/10, Greg Snow greg.s...@imail.org wrote:

From: Greg Snow greg.s...@imail.org
Subject: Re: [R] two questions for R beginners
To: Patrick Burns pbu...@pburns.seanet.com, r-help@r-project.org 
r-help@r-project.org
Date: Thursday, February 25, 2010, 9:42 PM

Patrick,

I would add one more question:

* where did you look for help expecting answers, but did not find them?

If you add hubris to laziness and impatience, you have Larry Wall's 3 virtues 
of a programmer.

To new users of R who may not understand why Patrick is asking:

Patrick Burns is the author of some great tutorials/references on S/R and is 
probably looking for questions to answer in
 his next contribution.

Lately there have been a large number of questions on some fairly basic issues 
(and some rather complex issues that people expected to be simple/basic).  My 
initial response (and probably others as well) to some of these requests was to 
quickly think that the answer is obvious and that the obvious place to look is 
..., but then I realize that I am a high school dropout who has been using S/R 
for over 20 years, majored in statistics but reads Shakespeare for fun, and 
have been known to saw people in half for the entertainment of others; so I am 
probably not representative of most beginners.  Fortune(89) probably applies 
here.  If R beginners will share their frustrations, where they looked but did 
not find answers (and why they looked there), what would have helped them, 
etc.  Then we (well probably Patrick mostly) can do more to help the next set 
of beginners.

It does not matter how good our
 answers are if they answer the wrong questions or are in places that the 
questioner never sees them.

The best way to spread information is to tell someone that it is a secret, the 
best way to keep it secret is to put it in a manual.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Patrick Burns
 Sent: Thursday, February 25, 2010 10:31 AM
 To: r-help@r-project.org
 Subject: [R] two questions for R beginners

 
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?
 
 * What documents helped you the most in this
 initial phase?
 
 I especially want to hear from people who are
 lazy and impatient.
 
 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.
 
 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted

Re: [R] two questions for R beginners

2010-02-25 Thread Sharpie


Patrick Burns wrote:
 
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?
 

R was the first scripting language that I *really* invested time in
learning.  Prior to R I had a few years experience programming in Fortran
and had worked on a few projects using Matlab.  Because most of my
programming experience was with Fortran, the toughest thing to get my head
around was definitely lexical scoping and that unlike Fortran subroutines, R
function results had to be assigned to something in order to persist outside
of the function. 


Patrick Burns wrote:
 
 * What documents helped you the most in this
 initial phase?
 

Definitely the An Introduction to R manual that ships with the core
distribution.  It helped me translate my knowledge of programming concepts
to the R language very quickly.


Patrick Burns wrote:
 
 I especially want to hear from people who are
 lazy and impatient.
 
 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.
 
 -- 
 Patrick Burns
 

-- 
View this message in context: 
http://n4.nabble.com/two-questions-for-R-beginners-tp1569384p1569901.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Tal Galili
My biggest stumbling blocks to getting up and running with R was whenever I
was lazy and impatient.

The more you love R, the more it loves you back.

Tal




Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Thu, Feb 25, 2010 at 7:31 PM, Patrick Burns pbu...@pburns.seanet.comwrote:

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 * What documents helped you the most in this
 initial phase?

 I especially want to hear from people who are
 lazy and impatient.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.

 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread RICHARD M. HEIBERGER
On Thu, Feb 25, 2010 at 5:39 PM, Carl Witthoft c...@witthoft.com wrote:
 Well, here goes...

 I still wish there were a really good monograph on the use and
 implementation of factors.

To get a good handle on factors, and the sets of contrasts they encode,
it is really necessary to study a good statistics book.  I recommend mine

Statistical Analysis and Data Display, An Intermediate Course with
Examples in S-Plus, R, and SAS,
Richard M. Heiberger and Burt Holland, Springer 2004

But I will acknowledge that other books are available.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Ralf B
My biggest blocker was my misconception that R is extremely difficult
to start with. It is powerful and one can do very complicated things (
that consequently turn things  complicated) but it comes with very
nice defaults and one can produce great results with standard tasks in
very little time - especially if one has done programming and/or
scripting before.

I pushed it away for too long that way. I wish I would have used it
years ago and avoided SPSS altogether - must have wasted 100s of hours
doing repetitive tasks by click and partial scripts in SPSS. Not to
mention a horrible license policy and a visualization unit that is
simply embarrassing for a product that is in its 18th or 19th version.

Ralf

On Thu, Feb 25, 2010 at 1:11 PM, Tal Galili tal.gal...@gmail.com wrote:
 My biggest stumbling blocks to getting up and running with R was whenever I
 was lazy and impatient.

 The more you love R, the more it loves you back.

 Tal




 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --




 On Thu, Feb 25, 2010 at 7:31 PM, Patrick Burns 
 pbu...@pburns.seanet.comwrote:

 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

 * What documents helped you the most in this
 initial phase?

 I especially want to hear from people who are
 lazy and impatient.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.

 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'The R Inferno' and 'A Guide for the Unwilling S User')

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] two questions for R beginners

2010-02-25 Thread Patrick Burns

* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?

* What documents helped you the most in this
initial phase?

I especially want to hear from people who are
lazy and impatient.

Feel free to write to me off-list.  Definitely
write off-list if you are just confirming what
has been said on-list.

--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'The R Inferno' and 'A Guide for the Unwilling S User')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Saeed Abu Nimeh
On Thu, Feb 25, 2010 at 9:31 AM, Patrick Burns pbu...@pburns.seanet.com wrote:
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

1- Compared to other programming languages it is hard to learn R by
example, because it is hard to find code on the web that will do the
exact thing you are looking for, sometimes you might get lucky though.
By contrast, take Perl for example, it is an easy language to learn by
example.

2- The R mailing list. Beginners get frustrated after they struggle
for a long time to solve a problem and the easiest thing then is to
send an email to the R mailing list. I did this in the past. The best
thing that happened was that my request was neglected and I had to
spend more time on the problem and find a solution by myself
eventually. Do not get me wrong, I am not saying that the mailing list
is bad, but it should be more organized. Maybe broken down into couple
of other mailing lists. This might bring up a good discussion thread.


 * What documents helped you the most in this
 initial phase?

An Introduction to R by Venables
simpleR – Using R for Introductory Statistics by Verzani

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-25 Thread Dieter Menne


Patrick Burns wrote:
 
 * What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?
 
 
(This derives partly from teaching)

The fact that this xapply-stuff was not idempotent (worse: not always) and
that you need a monster like do.call() to straighten this out. Nowadays,
plyr comes close.

The concept of environment. With S it was worse, though.

That you cannot change values passed by reference. I noted that the latter
is no problem for students who have not worked with c(++/#) before. That
there is only one return-result in functions.

[ and the likes as an operator.

10 years ago, when I started, the message was: S4 is the future, S3 is
legacy. So I learned S4. Only to never use is in self-written code later.
Might be different for BioConductor people.

That sometimes you can use vectors not in data= (lattice), and sometimes not
(ggplot2). Still a VERY confusing inconsistency.

The why-does-this-not-print FAQ.

Why does par(oma..) not work with lattice?

Dieter


-- 
View this message in context: 
http://n4.nabble.com/two-questions-for-R-beginners-tp1569384p1570249.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.