Re: [R] stacking data frames with different variables

2007-09-09 Thread Muenchen, Robert A (Bob)
Perfect. Thanks Hadley!


 -Original Message-
 From: hadley wickham [mailto:[EMAIL PROTECTED]
 Sent: Sunday, September 09, 2007 10:11 AM
 To: Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] stacking data frames with different variables
 
 Have a look at rbind.fill in the reshape package.
 
 Hadley
 
 On 9/9/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
  Hi All,
 
  If I need to stack two data frames, I can use rbind, but it requires
  that all variables exist in both sets. I can make that happen, but
 other
  stat packages would figure out where the differences were, add the
  missing variables to each, set their values to missing and stack
 them.
  Is there a more automatic way to do that in R?
 
  Below is an example program.
 
  Thanks,
  Bob
 
  # Top data frame has two variables.
  x - c(1,2)
  y - c(1,2)
 
  top - data.frame(x,y)
  top
 
  # Bottom data frame has only one of them.
  x - c(3,4)
  bottom - data.frame(x)
  bottom
 
  # So rbind won't work.
  rbind(top, bottom)
 
  # After figuring out where the mismatches are I can
  # make the two DFs the same manually.
  bottom - data.frame( bottom, y=NA)
  bottom
 
  # Now I get the desired result.
  both - rbind(top,bottom)
  both
 
  =
  Bob Muenchen (pronounced Min'-chen), Manager
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230
  FAX: (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web: http://oit.utk.edu/scc,
  News: http://listserv.utk.edu/archives/statnews.html
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 http://had.co.nz/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NAs in indices

2007-09-04 Thread Muenchen, Robert A (Bob)
Thanks to both Charles and Jim for such helpful info. 

The help file ?[.data.frame is just great. Too bad it is so hard to
find!

I had used na.strings on read.table but had gotten it in my head that it
was for numeric missing value codes. But of course, strings is
strings! That took care of periods everywhere  I was able to use my
original approach to get rid of some 99's and 999's that applied only to
certain columns (na.strings would zap them for all columns).

Jim's suggestion to add which makes perfect sense. I really don't like
the idea of referencing x[NA] even though x[c(T,T,F,F,NA,F)] might make
it obvious which were wanted. I'm surprised I didn't get caught by that
long ago.

Cheers,
Bob

 
 -Original Message-
 From: Charles C. Berry [mailto:[EMAIL PROTECTED]
 Sent: Sunday, September 02, 2007 2:33 PM
 To: Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] NAs in indices
 
 On Sun, 2 Sep 2007, Muenchen, Robert A (Bob) wrote:
 
  Hi All,
 
  I'm fiddling with an program to read a text file containing periods
 that
  SAS uses for missing values. I know that if I had the original SAS
 data
  set instead of a text file, R would handle this conversion for me.
 
  Data frames do not allow missing values in their indices but vectors
 do.
  Why is that? A search of the error message points out the problem
and
  solution but not why they differ. A simplified program that
 demonstrates
  the issue is below.
 
  Thanks,
  Bob
 
  # Here's a data frame that has both periods and NAs.
  # I want sex to remain character for now.
 
  sex=c(m,f,.,NA)
  x=c(1,2,3,NA)
  myDF - data.frame(sex,x,stringsAsFactors=F)
  rm(sex,x)
  myDF
 
  # Substituting NA into data frame does not work
  # due to NAs in the indices. The error message is:
  # missing values are not allowed in subscripted assignments of data
  frames
 
  myDF[ myDF$sex==., sex ] - NA
  myDF
 
  # This works because myDF$sex is a vector and vectors allow NAs in
  indexes.
  # Why don't data frames allow this?
 
  myDF$sex[ myDF$sex==. ] - NA
  myDF
 
 
 R version 2.5.1  'allows' it.
 
 
  df - as.data.frame(diag(3)[,-1])
  df[ df[,1]==1 ] - NA
  df
 
 but the result may not be what you were expecting. See
 
?[.data.frame
 
 (esp. Details) for more info on why it does not 'work' as you
expected.
 
 
 Also, since you mention a 'text file' I suggest you look at
 
?read.table
 
 or
 
   ?scan
 
 where you will see that
 
   dots.are.NA - read.table(my.file, na.strings = '.' )
 
 may help you.
 
 Chuck
 
 
  =
  Bob Muenchen (pronounced Min'-chen), Manager
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230
  FAX: (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web: http://oit.utk.edu/scc,
  News: http://listserv.utk.edu/archives/statnews.html
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 Charles C. Berry(858) 534-2098
  Dept of Family/Preventive
 Medicine
 E mailto:[EMAIL PROTECTED]UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-
 0901


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing transform to with

2007-09-04 Thread Muenchen, Robert A (Bob)
Gabor, 

That's very nice! I like your my.transform much better. Too bad about
the incompatibility. Swapping that out would no doubt break some
existing programs. I love that old joke, God was able to create the
universe in just 6 days only because he didn't have an installed base to
worry about!

Cheers,
Bob

 -Original Message-
 From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
 Sent: Sunday, September 02, 2007 10:47 AM
 To: Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Comparing transform to with
 
 Try this version of transform.  In the first test we show
 it works on your example but we have used the head of the built in
 anscombe data set.  The second and third show that
 it necessarily is incompatible with transform because transform
 always looks up variables in DF first whereas my.transform looks
 up the computed ones first.
 
 my.transform - function(DF, ...) {
   f - function(){}
   formals(f) - eval(substitute(as.pairlist(c(alist(...), DF
   body(f) - substitute(modifyList(DF, data.frame(...)))
   f()
 }
 
 # test
 a - head(anscombe)
 # 1
 my.transform(a, sum1 = x1+x2+x3+x4, sum2 = y1+y2+y3+y4, total =
 sum1+sum2)
 # 2
 my.transform(a, y2 = y1, y3 = y2)
 # 3
 transform(a, y2 = y1, y3 = y2) # different
 
 
 On 9/1/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
  Hi All,
 
  I've been successfully using the with function for analyses and the
  transform function for multiple transformations. Then I thought, why
 not
  use with for both? I ran into problems  couldn't figure them out
 from
  help files or books. So I created a simplified version of what I'm
  doing:
 
  rm( list=ls() )
  x1-c(1,3,3)
  x2-c(3,2,1)
  x3-c(2,5,2)
  x4-c(5,6,9)
  myDF-data.frame(x1,x2,x3,x4)
  rm(x1,x2,x3,x4)
  ls()
  myDF
 
  This creates two new variables just fine
 
  transform(myDF,
   sum1=x1+x2,
   sum2=x3+x4
  )
 
  This next code does not see sum1, so it appears that transform
 cannot
  see the variables that it creates. Would I need to transform new
  variables in a second pass?
 
  transform(myDF,
   sum1=x1+x2,
   sum2=x3+x4,
   total=sum1+sum2
  )
 
  Next I'm trying the same thing using with. It doesn't not work but
  also does not generate error messages, giving me the impression that
 I'm
  doing something truly idiotic:
 
  with(myDF, {
   sum1-x1+x2
   sum2-x3+x4
   total - sum1+sum2
  } )
  myDF
  ls()
 
  Then I thought, perhaps one of the advantages of transform is that
 it
  works on the left side of the equation without using a longer name
 like
  myDF$sum1. with probably doesn't do that, so I use the longer form
  below. It also does not work and generates no error messages.
 
  # Try it again, writing vars to myDF explicitly.
  # It generates no errors, and no results.
  with(myDF, {
   myDF$sum1-x1+x2
   myDF$sum2-x3+x4
   myDF$total - myDF$sum1+myDF$sum2
  } )
  myDF
  ls()
 
  I would appreciate some advice about the relative roles of these two
  functions  why my attempts with with have failed.
 
  Thanks!
  Bob
 
  =
  Bob Muenchen (pronounced Min'-chen), Manager
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230
  FAX: (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web: http://oit.utk.edu/scc,
  News: http://listserv.utk.edu/archives/statnews.html
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NAs in indices

2007-09-02 Thread Muenchen, Robert A (Bob)
Hi All,

I'm fiddling with an program to read a text file containing periods that
SAS uses for missing values. I know that if I had the original SAS data
set instead of a text file, R would handle this conversion for me. 

Data frames do not allow missing values in their indices but vectors do.
Why is that? A search of the error message points out the problem and
solution but not why they differ. A simplified program that demonstrates
the issue is below.

Thanks,
Bob

# Here's a data frame that has both periods and NAs.
# I want sex to remain character for now.

sex=c(m,f,.,NA)
x=c(1,2,3,NA)
myDF - data.frame(sex,x,stringsAsFactors=F)
rm(sex,x)
myDF

# Substituting NA into data frame does not work
# due to NAs in the indices. The error message is:
# missing values are not allowed in subscripted assignments of data
frames

myDF[ myDF$sex==., sex ] - NA
myDF

# This works because myDF$sex is a vector and vectors allow NAs in
indexes.
# Why don't data frames allow this?

myDF$sex[ myDF$sex==. ] - NA
myDF

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Comparing transform to with

2007-09-01 Thread Muenchen, Robert A (Bob)
Hi All,

I've been successfully using the with function for analyses and the
transform function for multiple transformations. Then I thought, why not
use with for both? I ran into problems  couldn't figure them out from
help files or books. So I created a simplified version of what I'm
doing:

rm( list=ls() )
x1-c(1,3,3)
x2-c(3,2,1)
x3-c(2,5,2)
x4-c(5,6,9)
myDF-data.frame(x1,x2,x3,x4)
rm(x1,x2,x3,x4)
ls()
myDF

This creates two new variables just fine

transform(myDF,
  sum1=x1+x2,
  sum2=x3+x4
)

This next code does not see sum1, so it appears that transform cannot
see the variables that it creates. Would I need to transform new
variables in a second pass?

transform(myDF,
  sum1=x1+x2,
  sum2=x3+x4,
  total=sum1+sum2
)

Next I'm trying the same thing using with. It doesn't not work but
also does not generate error messages, giving me the impression that I'm
doing something truly idiotic:

with(myDF, {
  sum1-x1+x2
  sum2-x3+x4
  total - sum1+sum2
} )
myDF
ls()

Then I thought, perhaps one of the advantages of transform is that it
works on the left side of the equation without using a longer name like
myDF$sum1. with probably doesn't do that, so I use the longer form
below. It also does not work and generates no error messages. 

# Try it again, writing vars to myDF explicitly.
# It generates no errors, and no results.
with(myDF, {
  myDF$sum1-x1+x2
  myDF$sum2-x3+x4
  myDF$total - myDF$sum1+myDF$sum2
} )
myDF
ls()

I would appreciate some advice about the relative roles of these two
functions  why my attempts with with have failed.

Thanks!
Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-27 Thread Muenchen, Robert A (Bob)
Gabor, That works great!

I think this would be a very helpful addition to the main R
distribution. Perhaps with a single colon representing numerical order
(exactly as you have written it) and two colons representing the order
of the variables as they appear in the data frame (your first example).
That's analogous to SAS' x1-xN, which you know gets those N variables,
and a--z, which selects an unknown number of variables a through z. How
many that is depends upon their order in the data frame. That would not
only be very useful in general, but it would also make transitioning to
R from SAS or SPSS less confusing.

Is R still being extended in such basic ways, or does that muck up
existing programs too much?

Thanks,
Bob

 -Original Message-
 From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
 Sent: Sunday, August 26, 2007 8:52 PM
 To: Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] subset using noncontiguous variables by name (not
 index)
 
 Try this:
 
  %:% - function(x, y) {
 +prex - gsub([0-9], , x); postx - gsub([^0-9], , x)
 +prey - gsub([0-9], , y); posty - gsub([^0-9], , y)
 +stopifnot(prex == prey)
 +paste(prex, seq(from = as.numeric(postx), to =
 as.numeric(posty)), sep = )
 + }
  x2 %:% x4
 [1] x2 x3 x4
 
 
 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
  Thanks Bert  Gabor for two very interesting solutions!
 
  It would be very handy in R if string1:stringN generated
  string1,string2...stringN it would make selections like this
 much
  more obvious. I know it's easy to with the colon operator and paste
  function but that's quite a step up in complexity compared to SAS'
x1
  x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that
beginners
  face early in learning R.
 
  While on the subject of the colon operator, why doesn't
 anscombe[[1:4]]
  select the x variables in list form as anscombe[,1:4] or
 anscombe[1:4]
  do in data frame form?
 
  Thanks,
 
  Bob
 
  =
  Bob Muenchen (pronounced Min'-chen), Manager
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230
  FAX: (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web: http://oit.utk.edu/scc,
  News: http://listserv.utk.edu/archives/statnews.html
  =
 
 
   -Original Message-
   From: Bert Gunter [mailto:[EMAIL PROTECTED]
   Sent: Sunday, August 26, 2007 6:50 PM
   To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
   Cc: r-help@stat.math.ethz.ch
   Subject: RE: [R] subset using noncontiguous variables by name (not
   index)
  
   The problem is that x3:x5 does not mean what you think it means.
 The
   only
   reason it does the right thing in subset() is because a clever
 trick
  is
   used
   there (read the code -- it's not hard to understand) to ensure
that
 it
   does.
   Gabor has essentially mimicked that trick in his solution.
  
   However, it is not necessary do this. You can construct the call
   directly as
   you tried to do. Using the anscombe example, here's how:
  
   chooz - c(x1,x3:x4,y2)  ## enclose the desired expression in
 quotes
   do.call (subset, list( x = anscombe, select = parse(text =
chooz)))
  
   -- Bert Gunter
   Genentech Non-Clinical Statistics
   South San Francisco, CA
  
   The business of the statistician is to catalyze the scientific
   learning
   process.  - George E. P. Box
  
  
  
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gabor
Grothendieck
Sent: Sunday, August 26, 2007 2:10 PM
To: Muenchen, Robert A (Bob)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] subset using noncontiguous variables by name
(not index)
   
Using builtin data frame anscombe try this. First we set up a
data frame
anscombe.seq which has one row containing 1, 2, 3, ... .  Then
  select
out from that data frame and unlist it to get the desired
index vector.
   
 anscombe.seq - replace(anscombe[1,], TRUE,
 seq_along(anscombe))
 idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
 anscombe[idx]
   x1 x3 x4   y2
1  10 10  8 9.14
2   8  8  8 8.14
3  13 13  8 8.74
4   9  9  8 8.77
5  11 11  8 9.26
6  14 14  8 8.10
7   6  6  8 6.13
8   4  4 19 3.10
9  12 12  8 9.13
10  7  7  8 7.26
11  5  5  8 4.74
   
   
On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
 Hi All,

 I'm using the subset function to select a list of variables,
 some
   of
 which are contiguous in the data frame, and others of which
are not. It
 works fine when I use the form:

 subset(mydata,select=c(x1,x3:x5,x7) )

 In reality, my list is far more complex. So I would like to
store it in
 a variable to substitute in for c(x1,x3:x5,x7) but cannot get

[R] FW: subset using noncontiguous variables by name (not index)

2007-08-27 Thread Muenchen, Robert A (Bob)
Thomas, that's a good point. I was thinking of anscombe[x1::y1] making
it clear which one, but you would then want just x1::y1 to have
unambiguous meaning on its own, which is impossible.

As for x1:xN, it's unambiguous on its own. I thought one of the great
advantages of R was that it could use different methods so that a new
operator would not be needed. The colon operator would just have a new
method for when stringN appeared. One that would be very useful  have
obvious meaning. 

Thanks,
Bob

 -Original Message-
 From: Thomas Lumley [mailto:[EMAIL PROTECTED]
 Sent: Monday, August 27, 2007 10:25 AM
 To: Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] subset using noncontiguous variables by name (not
 index)
 
 On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:
 
  Gabor, That works great!
 
  I think this would be a very helpful addition to the main R
  distribution. Perhaps with a single colon representing numerical
 order
  (exactly as you have written it) and two colons representing the
 order
  of the variables as they appear in the data frame (your first
 example).
  That's analogous to SAS' x1-xN, which you know gets those N
 variables,
  and a--z, which selects an unknown number of variables a through z.
 How
  many that is depends upon their order in the data frame. That would
 not
  only be very useful in general, but it would also make transitioning
 to
  R from SAS or SPSS less confusing.
 
  Is R still being extended in such basic ways, or does that muck up
  existing programs too much?
 
 
 In principle base R can be extended like that, but a strong case is
 needed
 for non-standard evaluation rules and for depleting the restricted
 supply
 of short binary operator names.
 
 The reason for subset() and its behaviour is that 'variables as they
 appear the in data frame' is typically ambiguous -- which data frame?
 In
 SPSS you have only one and in SAS there is a default one, so there is
 no
 ambiguity in X1--Y2, but in R it needs another argument specifying the
 data frame, so it can't really be a binary operator.
 
 The double colon :: and triple colon ::: are already used for
 namespaces,
 and a search of r-help reveals two previous, different, suggestions
for
 %:%.
 
 
   -thomas
 
 Thomas Lumley Assoc. Professor, Biostatistics
 [EMAIL PROTECTED] University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-27 Thread Muenchen, Robert A (Bob)
Thanks for helping me see why R doesn't have the obvious! -Bob

 -Original Message-
 From: Thomas Lumley [mailto:[EMAIL PROTECTED]
 Sent: Monday, August 27, 2007 2:12 PM
 To: Muenchen, Robert A (Bob)
 Subject: RE: [R] subset using noncontiguous variables by name (not
 index)
 
 On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:
 
  Thomas, that's a good point. I was thinking of anscombe[x1::y1]
 making
  it clear which one, but you would then want just x1::y1 to have
  unambiguous meaning on its own, which is impossible.
 
  As for x1:xN, it's unambiguous on its own.
 
 
 It actually isn't. We already have a meaning. Consider
x1-4
xN-6
x1:xN
 It also breaks R's argument passing rules by treating x1 as string
 rather than a name.
 
 What would be unambiguous at the moment is x1:x4, provided there
 was a sufficiently precise set of rules on what was allowed. Consider
   x1:x-1(negative?)
   x1:x3.14  (non-integer?)
   x3.12:x3.14 (is the prefix x or x3.?)
   x1:X4 (the prefix changes)
   01:14 (is the prefix empty or 0?)
   x09:xA2 (is this illegal decimal or legal hexadecimal?)
   IL23R1:IL23R4 (what is the prefix?)
   x1a:x4a(infix numbering?)
 
 
 
   -thomas
 
 Thomas Lumley Assoc. Professor, Biostatistics
 [EMAIL PROTECTED] University of Washington, Seattle


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread Muenchen, Robert A (Bob)
Hi All,

I'm using the subset function to select a list of variables, some of
which are contiguous in the data frame, and others of which are not. It
works fine when I use the form:

subset(mydata,select=c(x1,x3:x5,x7) )

In reality, my list is far more complex. So I would like to store it in
a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
work. That use of the c function seems to violate R rules, so I'm not
sure how it works at all. A small simulation of the problem is below. 

If the variable names  orders were really this simple, I could use
indices like 

summary( mydata[ ,c(1,3:5,7) ] ) 

but alas, they are not. 

How does the c function work this way in the first place, and how can I
make this substitution?

Thanks,
Bob

mydata - data.frame(
  x1=c(1,2,3,4,5),
  x2=c(1,2,3,4,5),
  x3=c(1,2,3,4,5),
  x4=c(1,2,3,4,5),
  x5=c(1,2,3,4,5),
  x6=c(1,2,3,4,5),
  x7=c(1,2,3,4,5)
)
mydata

# This does what I want.
summary( 
  subset(mydata,select=c(x1,x3:x5,x7) ) 
)

# Can I substitute myVars?
attach(mydata)
myVars1 - c(x1,x3:x5,x7)

# Not looking good!
myVars1

# This doesn't do the right thing.
summary( 
  subset(mydata,select=myVars1 ) 
)

# Total desperation on this attempt:
myVars2 - x1,x3:x5,x7
myVars2

# This doesn't work either.
summary( 
  subset(mydata,select=myVars2 )
)



=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread Muenchen, Robert A (Bob)
Thanks Bert  Gabor for two very interesting solutions!

It would be very handy in R if string1:stringN generated
string1,string2...stringN it would make selections like this much
more obvious. I know it's easy to with the colon operator and paste
function but that's quite a step up in complexity compared to SAS' x1
x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners
face early in learning R.

While on the subject of the colon operator, why doesn't anscombe[[1:4]]
select the x variables in list form as anscombe[,1:4] or anscombe[1:4]
do in data frame form?

Thanks,

Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=


 -Original Message-
 From: Bert Gunter [mailto:[EMAIL PROTECTED]
 Sent: Sunday, August 26, 2007 6:50 PM
 To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: RE: [R] subset using noncontiguous variables by name (not
 index)
 
 The problem is that x3:x5 does not mean what you think it means. The
 only
 reason it does the right thing in subset() is because a clever trick
is
 used
 there (read the code -- it's not hard to understand) to ensure that it
 does.
 Gabor has essentially mimicked that trick in his solution.
 
 However, it is not necessary do this. You can construct the call
 directly as
 you tried to do. Using the anscombe example, here's how:
 
 chooz - c(x1,x3:x4,y2)  ## enclose the desired expression in quotes
 do.call (subset, list( x = anscombe, select = parse(text = chooz)))
 
 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA
 
 The business of the statistician is to catalyze the scientific
 learning
 process.  - George E. P. Box
 
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Gabor
  Grothendieck
  Sent: Sunday, August 26, 2007 2:10 PM
  To: Muenchen, Robert A (Bob)
  Cc: r-help@stat.math.ethz.ch
  Subject: Re: [R] subset using noncontiguous variables by name
  (not index)
 
  Using builtin data frame anscombe try this. First we set up a
  data frame
  anscombe.seq which has one row containing 1, 2, 3, ... .  Then
select
  out from that data frame and unlist it to get the desired
  index vector.
 
   anscombe.seq - replace(anscombe[1,], TRUE, seq_along(anscombe))
   idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
   anscombe[idx]
 x1 x3 x4   y2
  1  10 10  8 9.14
  2   8  8  8 8.14
  3  13 13  8 8.74
  4   9  9  8 8.77
  5  11 11  8 9.26
  6  14 14  8 8.10
  7   6  6  8 6.13
  8   4  4 19 3.10
  9  12 12  8 9.13
  10  7  7  8 7.26
  11  5  5  8 4.74
 
 
  On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
   Hi All,
  
   I'm using the subset function to select a list of variables, some
 of
   which are contiguous in the data frame, and others of which
  are not. It
   works fine when I use the form:
  
   subset(mydata,select=c(x1,x3:x5,x7) )
  
   In reality, my list is far more complex. So I would like to
  store it in
   a variable to substitute in for c(x1,x3:x5,x7) but cannot get it
to
   work. That use of the c function seems to violate R rules,
  so I'm not
   sure how it works at all. A small simulation of the problem
  is below.
  
   If the variable names  orders were really this simple, I could
use
   indices like
  
   summary( mydata[ ,c(1,3:5,7) ] )
  
   but alas, they are not.
  
   How does the c function work this way in the first place,
  and how can I
   make this substitution?
  
   Thanks,
   Bob
  
   mydata - data.frame(
x1=c(1,2,3,4,5),
x2=c(1,2,3,4,5),
x3=c(1,2,3,4,5),
x4=c(1,2,3,4,5),
x5=c(1,2,3,4,5),
x6=c(1,2,3,4,5),
x7=c(1,2,3,4,5)
   )
   mydata
  
   # This does what I want.
   summary(
subset(mydata,select=c(x1,x3:x5,x7) )
   )
  
   # Can I substitute myVars?
   attach(mydata)
   myVars1 - c(x1,x3:x5,x7)
  
   # Not looking good!
   myVars1
  
   # This doesn't do the right thing.
   summary(
subset(mydata,select=myVars1 )
   )
  
   # Total desperation on this attempt:
   myVars2 - x1,x3:x5,x7
   myVars2
  
   # This doesn't work either.
   summary(
subset(mydata,select=myVars2 )
   )
  
  
  
   =
   Bob Muenchen (pronounced Min'-chen), Manager
   Statistical Consulting Center
   U of TN Office of Information Technology
   200 Stokely Management Center, Knoxville, TN 37996-0520
   Voice: (865) 974-5230
   FAX: (865) 974-4810
   Email: [EMAIL PROTECTED]
   Web: http://oit.utk.edu/scc,
   News: http://listserv.utk.edu/archives/statnews.html

[R] Saving results from Linux command line

2007-08-24 Thread Muenchen, Robert A (Bob)
Hi All,

I'm used to running R on Windows  learning Linux. I know ESS is the way
to go in the long run, but I'm trying now to just understand the command
line. I can interactively enter commands, see the results on the screen
and save input  output to myresults.txt with this approach:

$script myresults.txt
$R
 ...r commands...
q()
$exit

I can also use the Linux tee command to do essentially the same thing.

Both of those approaches do what I want, but I assume there is a way to
do it within R. I've been through AITR Appendix B and the FAQ looking
for either a startup option or an R function to do this but I don't see
either. What am I missing?

Thanks,
Bob



=
Bob Muenchen (pronounced Min'-chen), 
Manager, Statistical Consulting Center 
U of TN Office of Information Technology 
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc
News: http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving results from Linux command line

2007-08-24 Thread Muenchen, Robert A (Bob)
I certainly appreciate those advantages, but I feel I'm missing
something very basic. I would have expected a function like
save.transcript or save.console to be able to write out the console's
contents.

I see a similar situation in the Windows GUI. There is the menu choice
Save Workspace and the matching function save.image. In the console
window, there is the menu choice File Save to file but I don't see an
equivalent function.

Is are there functions for all menu choices in R?

Thanks,
Bob

 -Original Message-
 From: Richard M. Heiberger [mailto:[EMAIL PROTECTED]
 Sent: Friday, August 24, 2007 10:01 AM
 To: Muenchen, Robert A (Bob); r-help@stat.math.ethz.ch
 Subject: Re: [R] Saving results from Linux command line
 
 Go for the best and do it with ESS.
 
 ESS understands the file extension myfile.rt (not myfile.txt, which is
 generic)
 as an R transcript and therefore font-locks it for the R syntax and is
 able to resend
 multiple-line statements with a single ENTER.
 
 Within emacs, you can save the *R* buffer to a myfile.rt file (you can
 also
 save the R transcript as myfile.rt running inside a *shell* buffer,
 but that is silly at this point).
 
 Plus you get syntax highlighting and the other features on your
 myfile.R file.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving results from Linux command line

2007-08-24 Thread Muenchen, Robert A (Bob)
I looked long and hard for that information. Thank you VERY much! -Bob

 -Original Message-
 From: Richard M. Heiberger [mailto:[EMAIL PROTECTED]
 Sent: Friday, August 24, 2007 1:52 PM
 To: Muenchen, Robert A (Bob); r-help@stat.math.ethz.ch
 Subject: Re: [R] Saving results from Linux command line
 
 There can't be functions in the R language to save the transcript
 of a session.  In this respect R is a filter.  It takes an input
 stream of text and returns an output stream of text.  R doesn't
 remember
 the streams.  The Windows RGui remembers them.  The ESS *R* buffer
 remembers
 them.  Any terminal emulator could in principle remember them.
 R itself can't.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving results from Linux command line

2007-08-24 Thread Muenchen, Robert A (Bob)
As the help files says, ...like the Unix program tee. I thought sink
only diverted to a file. Thanks! -Bob

 -Original Message-
 From: Thomas Lumley [mailto:[EMAIL PROTECTED]
 Sent: Friday, August 24, 2007 2:17 PM
 To: Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Saving results from Linux command line
 
 
 There could still be functions that divert a copy of all the output to
 a
 file, for example.  And indeed there are.
 
 sink(transcript.txt, split=TRUE)
 
   -thomas
 
 On Fri, 24 Aug 2007, Muenchen, Robert A (Bob) wrote:
 
  I looked long and hard for that information. Thank you VERY much! -
 Bob
 
  -Original Message-
  From: Richard M. Heiberger [mailto:[EMAIL PROTECTED]
  Sent: Friday, August 24, 2007 1:52 PM
  To: Muenchen, Robert A (Bob); r-help@stat.math.ethz.ch
  Subject: Re: [R] Saving results from Linux command line
 
  There can't be functions in the R language to save the transcript
  of a session.  In this respect R is a filter.  It takes an input
  stream of text and returns an output stream of text.  R doesn't
  remember
  the streams.  The Windows RGui remembers them.  The ESS *R* buffer
  remembers
  them.  Any terminal emulator could in principle remember them.
  R itself can't.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 Thomas Lumley Assoc. Professor, Biostatistics
 [EMAIL PROTECTED] University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] length, mean, na.rm, na.omit...

2007-05-18 Thread Muenchen, Robert A (Bob)
Hi All,

Can anyone tell me why the length function does not use na.rm? I know
how to work around it, I'm just curious to know why such a useful option
was left out.

I'm also interested in the logic of setting na.rm=TRUE as the default on
mean, sd, etc. This is the opposite of the many other stat packages I
have used, so I assume it provides some programming benefit that is not
obvious to me.

Thanks,
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] do.call vs. lapply for lists

2007-04-09 Thread Muenchen, Robert A (Bob)
Hi All,

I'm trying to understand the difference between do.call and lapply for
applying a function to a list. Below is one of the variations of
programs (by Marc Schwartz) discussed here recently to select the first
and last n observations per group.

I've looked in several books, the R FAQ and searched the archives, but I
can't find enough to figure out why lapply doesn't do what do.call does
in this case. The help files  newsletter descriptions of do.call sound
like it would do the same thing, but I'm sure that's due to my lack of
understanding about their specific terminology. I would appreciate it if
you could take a moment to enlighten me. 

Thanks,
Bob

mydata - data.frame(
  id  = c('001','001','001','002','003','003'),
  math= c(80,75,70,65,65,70),
  reading = c(65,70,88,NA,90,NA)
)
mydata

mylast - lapply( split(mydata,mydata$id), tail, n=1)
mylast
class(mylast) #It's a list, so lapply will so *something* with it.

#This gets the desired result:
do.call(rbind, mylast)

#This doesn't do the same thing, which confuses me:
lapply(mylast,rbind)

#...and data.frame won't fix it as I've seen it do in other
circumstances:
data.frame( lapply(mylast,rbind) )

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do.call vs. lapply for lists

2007-04-09 Thread Muenchen, Robert A (Bob)
Marc,

That makes the difference between do.call and lapply crystal clear. Your
explanation would make a nice FAQ entry.

Thanks!
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html
=


 -Original Message-
 From: Marc Schwartz [mailto:[EMAIL PROTECTED]
 Sent: Monday, April 09, 2007 1:06 PM
 To: Muenchen, Robert A (Bob)
 Cc: R-help@stat.math.ethz.ch
 Subject: Re: do.call vs. lapply for lists
 
 On Mon, 2007-04-09 at 12:45 -0400, Muenchen, Robert A (Bob) wrote:
  Hi All,
 
  I'm trying to understand the difference between do.call and lapply
 for
  applying a function to a list. Below is one of the variations of
  programs (by Marc Schwartz) discussed here recently to select the
 first
  and last n observations per group.
 
  I've looked in several books, the R FAQ and searched the archives,
 but I
  can't find enough to figure out why lapply doesn't do what do.call
 does
  in this case. The help files  newsletter descriptions of do.call
 sound
  like it would do the same thing, but I'm sure that's due to my lack
 of
  understanding about their specific terminology. I would appreciate
it
 if
  you could take a moment to enlighten me.
 
  Thanks,
  Bob
 
  mydata - data.frame(
id  = c('001','001','001','002','003','003'),
math= c(80,75,70,65,65,70),
reading = c(65,70,88,NA,90,NA)
  )
  mydata
 
  mylast - lapply( split(mydata,mydata$id), tail, n=1)
  mylast
  class(mylast) #It's a list, so lapply will so *something* with it.
 
  #This gets the desired result:
  do.call(rbind, mylast)
 
  #This doesn't do the same thing, which confuses me:
  lapply(mylast,rbind)
 
  #...and data.frame won't fix it as I've seen it do in other
  circumstances:
  data.frame( lapply(mylast,rbind) )
 
 Bob,
 
 A key difference is that do.call() operates (in the above example) as
 if
 the actual call was:
 
  rbind(mylast[[1]], mylast[[2]], mylast[[3]])
id math reading
 3 001   70  88
 4 002   65  NA
 6 003   70  NA
 
 In other words, do.call() takes the quoted function and passes the
list
 object as if it was a list of individual arguments. So rbind() is only
 called once.
 
 In this case, rbind() internally handles all of the factor level
 issues,
 etc. to enable a single common data frame to be created from the three
 independent data frames contained in 'mylast':
 
  str(mylast)
 List of 3
  $ 001:'data.frame':1 obs. of  3 variables:
   ..$ id : Factor w/ 3 levels 001,002,003: 1
   ..$ math   : num 70
   ..$ reading: num 88
  $ 002:'data.frame':1 obs. of  3 variables:
   ..$ id : Factor w/ 3 levels 001,002,003: 2
   ..$ math   : num 65
   ..$ reading: num NA
  $ 003:'data.frame':1 obs. of  3 variables:
   ..$ id : Factor w/ 3 levels 001,002,003: 3
   ..$ math   : num 70
   ..$ reading: num NA
 
 
 On the other hand, lapply() (as above) calls rbind() _separately_ for
 each component of mylast.  It therefore acts as if the following
series
 of three separate calls were made:
 
 
  rbind(mylast[[1]])
id math reading
 3 001   70  88
 
  rbind(mylast[[2]])
id math reading
 4 002   65  NA
 
  rbind(mylast[[3]])
id math reading
 6 003   70  NA
 
 
 Of course, the result of lapply() is that the above are combined into
a
 single R list object and returned:
 
  lapply(mylast, rbind)
 $`001`
id math reading
 3 001   70  88
 
 $`002`
id math reading
 4 002   65  NA
 
 $`003`
id math reading
 6 003   70  NA
 
 
 It is a subtle, but of course critical, difference in how the internal
 function is called and how the arguments are passed.
 
 Does that help?
 
 Regards,
 
 Marc Schwartz


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-23 Thread Muenchen, Robert A (Bob)
The Exegesis paper gave me a great look at the history of all this. I
had not been aware that S-PLUS had gone that route. There is much to be
said for knowing you might be more successful but sticking to your
perspective instead. And in the long run, that may be the more
successful route anyway. 

Thanks,
Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=

 -Original Message-
 From: Liaw, Andy [mailto:[EMAIL PROTECTED]
 Sent: Thursday, March 22, 2007 5:27 PM
 To: Douglas Bates; Muenchen, Robert A (Bob)
 Cc: R-help@stat.math.ethz.ch
 Subject: RE: [R] how to get lsmeans?
 
 From: Douglas Bates
 
  On 3/22/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
 
   Perhaps I'm stating the obvious, but to increase the use of R in
   places where SAS  SPSS dominate, it's important to make
  getting the
   same answers as easy as possible. That includes things like
lsmeans
   and type III sums of squares. I've read lots of discussions here
on
   sums of squares  I'm not advocating type III use, just
  looking at it
   from a marketing perspective. Too many people look for
  excuses to not change.
   The fewer excuses, the better.
 
  You may get strong reactions to such a suggestion.  I
  recommend reading Bill Venables' famous unpublished paper
  Exegeses on linear models (google for the title - very few
  people use Exegeses and linear models in the same
  sentence - in fact I would not be surprised if Bill was the
  only one who has ever done so).
 
 It's on the MASS page:
 http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf
 I believe it's based on a talk Bill gave at a S-PLUS User's
Conference.
 I think it deserves to be required reading for all graduate level
 linear
 models course.
 
  You must realize that R is written by experts in statistics
  and statistical computing who, despite popular opinion, do
  not believe that everything in SAS and SPSS is worth copying.
   Some things done in such packages, which trace their roots
  back to the days of punched cards and magnetic tape when
  fitting a single linear model may take several days because
  your first 5 attempts failed due to syntax errors in the JCL
  or the SAS code, still reflect the approach of give me every
  possible statistic that could be calculated from this model,
  whether or not it makes sense.  The approach taken in R is
 different.
   The underlying assumption is that the useR is thinking about
  the analysis while doing it.
 
  The fact that it is so difficult to explain what lsmeans are
  and why they would be of interest is an indication of why
  they aren't implemented in any of the required packages.
 
 Perhaps I should have made it clear in my original post:  I gave the
 example and code more to show what the mysterious least squares
means
 are (which John explained lucidly), than how to replicate what SAS (or
 JMP) outputs.  I do not understand how people can feel comfortable
 reporting things like lsmeans and p-values from type insert your
 favorite Roman numeral here tests when they do not know how such
 things
 arise or, at the very least, what they _really_ mean.  (Given how
 simple
 lsmeans are computed, not knowing how to compute them is pretty much
 the
 same as not knowing what they are.)  One of the dangers of wholesale
 output as SAS or SPSS gives is for the user to simply pick an answer
 and
 run with it, without understanding what that answer is, or if it
 corresponds to the question of interest.
 
 As to whether to weight the levels of the factors being held constant,
 my suggestion to John would be to offer both choices (unweighted and
 weighted by observed frequencies).  I can see why one would want to
 weight by observed frequencies (if the data are sampled from a
 population), but there are certainly situations (perhaps more often
 than
 not in the cases I've encountered) that the observed frequencies do
not
 come close to approximating what they are in the population.  In such
 cases the unweighted average would make more sense to me.
 
 Cheers,
 Andy
 
 
-Original Message-
From: [EMAIL PROTECTED] [mailto:r-help-
[EMAIL PROTECTED] On Behalf Of John Fox
Sent: Wednesday, March 21, 2007 8:59 PM
To: 'Prof Brian Ripley'
Cc: 'r-help'; 'Chuck Cleland'
Subject: Re: [R] how to get lsmeans?
   
Dear Brian et al.,
   
My apologies for chiming in late: It's been a busy day.
   
First some general comments on least-squares means and effect
displays.
The general idea behind the two is similar -- to examine fitted
values corresponding to a term in a model while holding

Re: [R] how to get lsmeans?

2007-03-22 Thread Muenchen, Robert A (Bob)
Hi All,

Perhaps I'm stating the obvious, but to increase the use of R in places
where SAS  SPSS dominate, it's important to make getting the same
answers as easy as possible. That includes things like lsmeans and type
III sums of squares. I've read lots of discussions here on sums of
squares  I'm not advocating type III use, just looking at it from a
marketing perspective. Too many people look for excuses to not change.
The fewer excuses, the better.

Of course this is easy for me to say, as I'm not the one who does the
work! Much thanks to those who do.

Cheers,
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html
=

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:r-help-
 [EMAIL PROTECTED] On Behalf Of John Fox
 Sent: Wednesday, March 21, 2007 8:59 PM
 To: 'Prof Brian Ripley'
 Cc: 'r-help'; 'Chuck Cleland'
 Subject: Re: [R] how to get lsmeans?
 
 Dear Brian et al.,
 
 My apologies for chiming in late: It's been a busy day.
 
 First some general comments on least-squares means and effect
 displays.
 The general idea behind the two is similar -- to examine fitted values
 corresponding to a term in a model while holding other terms to
typical
 values -- but the implementation is not identical. There are also
other
 similar ideas floating around as well. My formulation is more general
 in the
 sense that it applies to a wider variety of models, both linear and
 otherwise.
 
 Least-squares means (a horrible term, by the way: in a 1980 paper in
 the
 American Statistician, Searle, Speed, and Milliken suggested the more
 descriptive term population marginal means) apply to factors and
 combinations of factors; covariates are set to mean values and the
 levels of
 other factors are averaged over, in effect applying equal weight to
 each
 level. (This is from memory, so it's possible that I'm not getting it
 quite
 right, but I believe that I am.) In my effect displays, each level of
a
 factor is weighted by its proportion in the data. In models in which
 least-squares means can be computed, they should differ from the
 corresponding effect display by a constant (if there are different
 numbers
 of observations in the different levels of the factors that are held
 constant).
 
 The obstacle to computing either least-squares means or effect
displays
 in R
 via predict() is that predict() wants factors in the new data to be
 set to
 particular levels. The effect() function in the effects package
 bypasses
 predict() and works directly with the model matrix, averaging over the
 columns that pertain to a factor (and reconstructing interactions as
 necessary). As mentioned, this has the effect of setting the factor to
 its
 proportional distribution in the data. This approach also has the
 advantage
 of being invariant with respect to the choice of contrasts for a
 factor.
 
 The only convenient way that I can think of to implement least-squares
 means
 in R would be to use deviation-coded regressors for a factor (that is,
 contr.sum) and then to set the columns of the model matrix for the
 factor(s)
 to be averaged over to 0. It may just be that I'm having a failure of
 imagination and that there's a better way to proceed. I've not
 implemented
 this solution because it is dependent upon the choice of contrasts and
 because I don't see a general advantage to it, but since the issue has
 come
 up several times now, maybe I should take a crack at it. Remember that
 I
 want this to work more generally, not just for levels of factors, and
 not
 just for linear models.
 
 Brian is quite right in mentioning that he suggested some time ago
that
 I
 use critical values of t rather than of the standard normal
 distribution for
 producing confidence intervals, and I agree that it makes sense to do
 so in
 models in which the dispersion is estimated. My only excuse for not
yet
 doing this is that I want to undertake a more general revision of the
 effects package, and haven't had time to do it. There are several
 changes
 that I'd like to make to the package. For example, I have results for
 multinomial and proportional odds logit models (described in a paper
by
 me
 and Bob Andersen in the 2006 issue of Sociological Methodology) that I
 want
 to incorporate, and I'd like to improve the appearance of the default
 graphs. But Brian's suggestion is very straightforward, and I guess
 that I
 shouldn't wait to implement it; I'll do so very soon.
 
 Regards,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 

Re: [R] Select the last two rows by id group

2007-03-21 Thread Muenchen, Robert A (Bob)
Marc, thanks for so many great variations! I especially like:

tail(sort(table(DF$County)))

I often have frequency tables that are of interest only towards the end.

Cheers,
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html
=


 -Original Message-
 From: Marc Schwartz [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, March 20, 2007 8:58 PM
 To: Muenchen, Robert A (Bob)
 Cc: R-help@stat.math.ethz.ch
 Subject: Re: [R] Select the last two rows by id group
 
 On Tue, 2007-03-20 at 11:53 -0400, Muenchen, Robert A (Bob) wrote:
  Very nice! This is almost duplicates the SAS first.var and last.var
  ability to choose the first and last observations by group(s).
  Substituting the head function in where Marc has the tail function
 below
  will adapt it to the first n. It is more flexible than the SAS
 approach
  because it can do the first/last n rather than just the single first
 or
  last.
 
  Let's say we want to choose the last observation in a county, and
  counties have duplicate names in different states. You could sort by
  state, then county, then use only county where Marc uses score$id in
 his
  last example below, and it would get the last record for *every*
 county
  regardless of duplicates. Does this sound correct?
 
  That's a handy bit of code!
 
  Cheers,
  Bob
 
 Bob,
 
 You can test it using data here:
 
 DF - read.csv(http://www.nws.noaa.gov/nwr/SameCode.txt;,
header = FALSE)
 
 colnames(DF) - c(Code, County, State)
 
  str(DF)
 'data.frame':   3288 obs. of  3 variables:
  $ Code  : int  1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 ...
  $ County: Factor w/ 1996 levels Abbeville,Acadia,..: 97 105 116
 169 186 249 259 272 326 348 ...
  $ State : Factor w/ 60 levels AK,AL,AR,..: 2 2 2 2 2 2 2 2 2 2
 ...
 
 
 The data is already sorted by State and then County.
 
 
  system.time(DF.tail - do.call(rbind, lapply(split(DF, DF$County),
 tail,  1)))
 [1] 6.851 0.085 7.085 0.000 0.000
 
 
  str(DF.tail)
 'data.frame':   1996 obs. of  3 variables:
  $ Code  : int  45001 22001 16001 40001 55001 50001 72001 72003 72005
 72007 ...
  $ County: Factor w/ 1996 levels Abbeville,Acadia,..: 1 2 3 4 5 6
7
 8 9 10 ...
  $ State : Factor w/ 60 levels AK,AL,AR,..: 48 22 17 42 58 56 45
 45 45 45 ...
 
 
 # How many unique county names in the source dataset?
 
  length(unique(DF$County))
 [1] 1996
 
 
 # Are they all the same unique counties?
 
  all(DF.tail$County == sort(unique(DF$County)))
 [1] TRUE
 
 
 It is curious to see just how many duplicates there are. For example:
 
  tail(sort(table(DF$County)))
 
MadisonJacksonLincoln   Franklin  Jefferson Washington
 20 24 24 25 26 31
 
 
  subset(DF, County == Washington)
   Code County State
 651129 WashingtonAL
 181   5143 WashingtonAR
 304   8121 WashingtonCO
 385  12133 WashingtonFL
 535  13303 WashingtonGA
 593  16087 WashingtonID
 688  17189 WashingtonIL
 783  18175 WashingtonIN
 879  19183 WashingtonIA
 987  20201 WashingtonKS
 1106 21229 WashingtonKY
 1167 22117 WashingtonLA
 1189 23029 WashingtonME
 1211 24043 WashingtonMD
 1393 27163 WashingtonMN
 1474 28151 WashingtonMS
 1590 29221 WashingtonMO
 1740 31177 WashingtonNE
 1883 36115 WashingtonNY
 1981 37187 WashingtonNC
 2124 39167 WashingtonOH
 2202 40147 WashingtonOK
 2239 41067 WashingtonOR
 2304 42125 WashingtonPA
 2313 44009 WashingtonRI
 2515 47179 WashingtonTN
 2759 48477 WashingtonTX
 2800 49053 WashingtonUT
 2814 50023 WashingtonVT
 2904 51191 WashingtonVA
 3108 55131 WashingtonWI
 
 
 # The last state with Washington County (my neighbors, the
 Cheeseheads) was in the result set
 
  subset(DF.tail, County == Washington)
 Code County State
 Washington 55131 WashingtonWI
 
 
 
  subset(DF, County == Allen)
   Code County State
 697  18003  AllenIN
 887  20001  AllenKS
 993  21003  AllenKY
 1113 22003  AllenLA
 2042 39003  AllenOH
 
 
 # The last state with Allen County (OH) was in the result set
 
  subset(DF.tail, County == Allen)
Code County State
 Allen 39003  AllenOH
 
 
 Just noticed a Big Ten theme there...Go Gophers!   ;-)
 
 
 So, it would seem that your hypothesis is correct, at least in this
 limited testing.  I would want to validate it more rigorously of
 course.
 
 HTH,
 
 Marc Schwartz


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r

Re: [R] Select the last two rows by id group

2007-03-20 Thread Muenchen, Robert A (Bob)
Very nice! This is almost duplicates the SAS first.var and last.var
ability to choose the first and last observations by group(s).
Substituting the head function in where Marc has the tail function below
will adapt it to the first n. It is more flexible than the SAS approach
because it can do the first/last n rather than just the single first or
last.

Let's say we want to choose the last observation in a county, and
counties have duplicate names in different states. You could sort by
state, then county, then use only county where Marc uses score$id in his
last example below, and it would get the last record for *every* county
regardless of duplicates. Does this sound correct? 

That's a handy bit of code!

Cheers,
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html
=


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:r-help-
 [EMAIL PROTECTED] On Behalf Of Marc Schwartz
 Sent: Tuesday, March 20, 2007 10:59 AM
 To: Lauri Nikkinen
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Select the last two rows by id group
 
 On Tue, 2007-03-20 at 16:33 +0200, Lauri Nikkinen wrote:
  Hi R-users,
 
  Following this post
 http://tolstoy.newcastle.edu.au/R/help/06/06/28965.html ,
  how do I get last two rows (or six or ten) by id group out of the
 data
  frame? Here the example gives just the last row.
 
  Sincere thanks,
  Lauri
 
 A slight modification to Gabor's solution:
 
  score
   id reading math
 1  1  65   80
 2  1  70   75
 3  1  88   70
 4  2  NA   65
 5  3  90   65
 6  3  NA   70
 
 # Return the last '2' rows
 # Note the addition of unlist()
 
  score[unlist(tapply(rownames(score), score$id, tail,  2)), ]
   id reading math
 2  1  70   75
 3  1  88   70
 4  2  NA   65
 5  3  90   65
 6  3  NA   70
 
 
 Note that when tail() returns more than one value, tapply() will
create
 a list rather than a vector:
 
  tapply(rownames(score), score$id, tail,  2)
 $`1`
 [1] 2 3
 
 $`2`
 [1] 4
 
 $`3`
 [1] 5 6
 
 
 Thus, we need to unlist() the indices to use them in the subsetting
 process that Gabor used in his solution.
 
 Another alternative, if the rownames do not correspond to the
 sequential
 row indices as they do in this example:
 
  do.call(rbind, lapply(split(score, score$id), tail,  2))
 id reading math
 1.2  1  70   75
 1.3  1  88   70
 22  NA   65
 3.5  3  90   65
 3.6  3  NA   70
 
 
 This uses split() to create a list of data frames from score, where
 each
 data frame is 'split' by the 'id' column values. tail() is then
applied
 to each data frame using lapply(), the results of which are then
 rbind()ed back to a single data frame.
 
 HTH,
 
 Marc Schwartz
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SAS, SPSS Product Comparison Table

2007-02-13 Thread Muenchen, Robert A (Bob)
Hi All,

Thanks to lots of good ideas from R-helpers, I've polished up the table
and posted it here:
http://oit.utk.edu/scc/RforSASSPSSproducts.pdf 

To be consistent with its product orientation, I dropped mixed models
(it's not a separate product in either SAS or SPSS). I also added SAS/QC
and links to similar pages such as CRAN's Task Views. People (especially
Patrick Burns) sent the following list of topics that are not SAS or
SPSS products, but which might make good additions to Task Views:

resampling techniques: boot, coin (and many others)
report generation: R (the Sweave function)
neural networks: nnet, AMORE, neural, grnnR
finance: Rmetrics, portfolio (and several more)
designed experiments: BHH2, blockrand, conf.design, spc
Bayesian: BRugs, R2WinBUGS, bayesm (and many more)
circular statistics: CircStats, circular
robustness: R and many packages
medical imaging: DICOM, AnalyzeFMRI, fmri
functional data analysis: fda, MFDA
Robust
spatial statistics: spatial, spatstat, pastecs, fields, geoR (and more)
Markov chain Monte Carlo: MCMCpack, mcmc
meta-analysis: meta
graphical models: mimR, ggm
Mixed Models:   lmer, nlme, lme4
mixture models: mixreg, mixtools
pharmacokinetics: PK, PKfit, PKtools
musicology: tuneR
sudoku: sudoku

Frank Harrell made an excellent suggestion that this be a page at the
R-wiki. It's unlikely that any one person would know all these areas so
it might work out if everyone could edit the sections they know. If
anyone wants to put it up there, let me know  I'll be happy to send it
to you in any form you like. I expect once a table format was
established editing it would be easy.

I acknowledged everyone who wrote at the bottom of the table. If I
forgot anyone, it was an oversight. Drop me a line  I'll put you on
there. Thanks again to everyone for all the help!

Cheers,
Bob







=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] JGR data editor question

2007-02-10 Thread Muenchen, Robert A (Bob)
Hi All,

I'm learning JGR 1.4-15 with R 2.4.1 in Windows XP (all patches
applied). JGR looks great but I'm having trouble getting the data editor
to save my results. I don't see anything in R-help about it. Here are
the steps I followed:

1. I chose ToolsObject Browser  double-clicked on a data frame,
mydata. 
2. A spreadsheet editor popped up and allowed me to make changes. 
3. I clicked Update at the bottom right of the data editor screen. 
4. It asked, Export to R? and has Export as: mydata filled in. 
5. I clicked Yes and then closed the window by clicking the usual [X]
in the top right corner. 
6. Double-clicking the data file again opened it back up but the changes
were gone. 

Am I missing a step?

Thanks,
Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] JGR data editor question

2007-02-10 Thread Muenchen, Robert A (Bob)
That's it! I tried an absurd number of variations, but never that! I had
only changed one value and never left the cell. I assumed hitting Enter
would do it. Thanks! -Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=


-Original Message-
From: Jim Porzak [mailto:[EMAIL PROTECTED] 
Sent: Saturday, February 10, 2007 11:46 AM
To: Muenchen, Robert A (Bob)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] JGR data editor question

Hi Bob,

I can not reproduce your problem, with possible exception in your step
2:
In data editor, you need to click off of the last cell you edited for
the changes to take

On 2/10/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
 Hi All,

 I'm learning JGR 1.4-15 with R 2.4.1 in Windows XP (all patches
 applied). JGR looks great but I'm having trouble getting the data
editor
 to save my results. I don't see anything in R-help about it. Here are
 the steps I followed:

 1. I chose ToolsObject Browser  double-clicked on a data frame,
 mydata.
 2. A spreadsheet editor popped up and allowed me to make changes.
 3. I clicked Update at the bottom right of the data editor screen.
 4. It asked, Export to R? and has Export as: mydata filled in.
 5. I clicked Yes and then closed the window by clicking the usual
[X]
 in the top right corner.
 6. Double-clicking the data file again opened it back up but the
changes
 were gone.

 Am I missing a step?

 Thanks,
 Bob

 =
 Bob Muenchen (pronounced Min'-chen), Manager
 Statistical Consulting Center
 U of TN Office of Information Technology
 200 Stokely Management Center, Knoxville, TN 37996-0520
 Voice: (865) 974-5230
 FAX: (865) 974-4810
 Email: [EMAIL PROTECTED]
 Web: http://oit.utk.edu/scc,
 News: http://listserv.utk.edu/archives/statnews.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
HTH,
Jim Porzak
Loyalty Matrix Inc.
San Francisco, CA
http://www.linkedin.com/in/jimporzak

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SAS, SPSS Product Comparison Table

2007-02-10 Thread Muenchen, Robert A (Bob)
Hi All,

My paper R for SAS and SPSS Users received a bit more of a reaction
than I expected. I posted the link
(http://oit.utk.edu/scc/RforSASSPSSusers.pdf) about 12 days ago on
R-help and the equivalent SAS and SPSS lists. Since then people have
downloaded it 5,503 times and I've gotten lots of questions along the
lines of, Surely R can't do for free what [fill in a SAS or SPSS
product here] does? To try to address those, I've compiled a table that
is organized by the product categories SAS and SPSS offer. Keep in mind
that I still know far more about SAS and SPSS than I do about R, so I
could really use some help with this. The table is below in tabbed form.
I would appreciate it if the many R gurus out there would look it over
and send suggestions. I'll add it as an appendix when it's done (well,
as done as a moving target like this ever is!) 

Thanks,
Bob

Topic   SAS Product SPSS ProductR Package
Advanced Models SAS/STATSPSS Advanced Models(tm)R
Automated Data Preparation  NoneSPSS Data Preparation(tm)
None?
Automated Forecasting   SAS Forecast Studio DecisionTime/WhatIf(tm)
None?
Basics  SAS SPSS Base(tm)   R
Conjoint Analysis   SAS/STAT: Transreg  SPSS Conjoint(tm)
Acepack?
Correspondence Analysis SAS/STAT: Corresp   SPSS Categories(tm)
Homals, MASS, FactoMineR, ade4, PTAk, ccoresp, vegan, made4,PsychoR
Custom Tables   Base: Proc Tabulate SPSS Custom Tables(tm)  reshape
Data Mining Enterprise MinerClementine  Rattle
Exact Tests SAS/STAT: various   SPSS Exact Tests(tm)
exactLoglinTest
GeneticsSAS/Genetics, SAS/Microarray Solution, JMP Genomics
NoneBioconductor
GIS/Mapping SAS/GIS SPSS Maps(tm)   maps
Graphical User InterfaceEnterprise GuideSPSSJGR, R
Commander, pmg, Sciviews
GraphicsSAS/GRAPH(r)SPSS Base(tm)   R, ggplot
Guided Analysis SAS/LAB NoneNone
Matrix/Linear Algebra   SAS/IML(tm), SAS/IML Workshop   SPSS Matrix(tm)
R
Missing Values Imputation   SAS/STAT: Proc MI   SPSS Missing
Values Analysis(tm) aregImpute (Hmisc), fit.mult.impute (Design)
Mixed ModelsProc Mixed  SPSS Advanced Modelslmer
Operations Research SAS/OR  NoneTSP
Power Analysis  SAS/STAT:  Power,GLM Power  SamplePower(tm) asypow,
powerpkg, pwr
 Regression Models  SAS/BASESPSS Regression Models(tm)
R
Sampling, Nonrandom SAS/STAT: surveymeans, etc. SPSS Complex
Samples(tm) survey
Structural EquationsSAS/STAT: Calis Amos(tm)sem
Text Analysis   Text Miner  SPSS Text Analysis for Surveys(tm)
tm
Time Series SAS/ETS(tm) SPSS Trends(tm) ArDec, brainwaver, dyn,
fame, Systemfit, tsDyn, tseries, tseriesChaos, tsfa, urca, uroot
Trees, Decision or Regression   Enterprise MinerSPSS
Classification Trees(tm), AnswerTree(tm)tree, rpart
Visualization   SAS/INSIGHT Nonerggobi, GGobi

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R in Industry

2007-02-06 Thread Muenchen, Robert A (Bob)
That sounds like a good idea. The name R makes it especially hard to
find job postings, resumes or do any other type of search. Googling
resume+sas or job opening+sas is quick and fairly effective (less a
few airline jobs). Doing that with R is of course futile. At the risk of
getting flamed, it's too bad it's not called something more unique such
as Rpackage, Rlanguage, etc.

Cheers,
Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Doran, Harold
Sent: Tuesday, February 06, 2007 2:08 PM
To: R-help@stat.math.ethz.ch
Subject: [R] R in Industry

The other day, CNN had a story on working at Google. Out of curiosity, I
went to the Google employment web site (I'm not looking, but just
curious). In perusing their job posts for statisticians, preference is
given to those who use R and python. Other languages, S-Plus and
something called SAS were listed as lower priorities.

When I started using Python, I noted they have a portion of the web site
with job postings. CRAN does not have something similar, but think it
might be useful. I think R is becoming more widely used in industry and
I wonder if helping it move along a bit, the maintainer of CRAN could
create a section of the web site devoted to jobs where R is a
requirement.

Hence, we could have our own little monster.com kind of thing going
on. Of the multitude of ways the gospel can be spread, this is small.
But, I think every small step forward is good.

Anyone think this is useful? 

Harold


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R for SAS SPSS Users Document

2007-01-31 Thread Muenchen, Robert A (Bob)
Julien Barnier wrote: ... I think it will be very useful to me, even if
I will use it the reverse way : learn how to use SAS from R...

I hadn't thought of using the document in reverse to learn SAS or SPSS
if you already know R. I'll have to reread it from that perspective 
see if there are any changes I can make to help in that direction
without a total rewrite. If anyone has any suggestions along those
lines, please send them my way.

Thanks for the PDF tip. Several people suggested that. I thought cutting
 pasting examples would be important, which is not as easy from PDF.
OpenOffice can open the .doc version on Linux if you use that. I have
added a PDF version at the same link ending in PDF:
http://oit.utk.edu/scc/RforSASSPSSusers.pdf

Cheers,
Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Julien Barnier
Sent: Wednesday, January 31, 2007 3:20 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] R for SAS  SPSS Users Document

Hi,

 I am pleased to announce the availability of the document, R for SAS
 and SPSS Users, at 
 http://oit.utk.edu/scc/RforSASSPSSusers.doc

I've looked at the document and printed it. I think it will be very
useful to me, even if I will use it the reverse way : learn how to
use SAS from R...

As I am far from an R expert, I will not be able to give you good
advices on R code. But maybe you would have had more comments on your
tutorial if you had given the link to the PDF version instead of the
MSWord one :

http://oit.utk.edu/scc/RforSASSPSSusers.pdf

Thanks again for your document,

-- 
Julien

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spss.get. Warning with SPSS 14 dataset

2007-01-30 Thread Muenchen, Robert A (Bob)
Here's a warning about that:
http://tolstoy.newcastle.edu.au/R/help/04/12/8827.html 

Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html
=


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:r-help-
 [EMAIL PROTECTED] On Behalf Of John Kane
 Sent: Friday, January 26, 2007 11:49 AM
 To: R R-help
 Subject: [R] spss.get. Warning with SPSS 14 dataset
 
 I am using spss.get to import an SPSS database
 Data.sav, created with SPSS 14  :
 
 df1 - spss.get(C:/temp/Data.sav , lowernames=TRUE,
 datevars = c(dateinte))
 
 I am getting this warning. I get the same warning with
 read.spss.
 
 Warning message:
 C:/temp/Data.sav: Unrecognized record type 7, subtype
 16 encountered in system file
 
 This is a stupid question but should I be worried
 about it?  So far the data looks clean but it is not
 my data base originally and I wondered if there is
 anything specific that I should be checking for.
 
 Thanks.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R for SAS SPSS Users Document

2007-01-29 Thread Muenchen, Robert A (Bob)
Greetings,

I am pleased to announce the availability of the document, R for SAS
and SPSS Users, at 
http://oit.utk.edu/scc/RforSASSPSSusers.doc .  It presents an
introductory view of R for people who already know SAS and/or SPSS.
Included are 27 programs written in all three languages (i.e. 81 total)
so that people can see how R works compared to the other two, task by
task.

I would appreciate it if folks with far more R expertise than I have
could review it and provide advice on ways to improve programming
examples or wording. The wording was challenging since the jargon used
by the three packages differs so much. I'm sure there is much room for
improvement.

Cheers,
Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregation using list with Hmisc summarize function

2006-12-28 Thread Muenchen, Robert A (Bob)
Hi All,

 

I'm using the Hmisc summarize function and used list instead of llist to
provide the by variables. It generated an error message. Is this a bug,
or do I misunderstand how Hmisc works with lists? The program below
demonstrates the error message.

 

Thanks,

Bob

 

x-1:8

group - c(1,1,1,1,2,2,2,2)

gender- c(1,2,1,2,1,2,1,2)

 

mydata-data.frame(x,group,gender)

attach(mydata)

 

# Creating a list using Hmisc llist works:

summarize(x, by=llist(group,gender), FUN=mean, na.rm=TRUE) 

 

# Creating a list using built-in list function does not:

summarize(x, by= list(group,gender), FUN=mean, na.rm=TRUE)

 

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc http://oit.utk.edu/scc , 
News: http://listserv.utk.edu/archives/statnews.html
http://listserv.utk.edu/archives/statnews.html 
=

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Switching labels on a factor

2006-12-19 Thread Muenchen, Robert A (Bob)
Chris,

Argh!!! I was writing a reply just now insisting that the output below
makes no sense when it finally hit me: the first line of output from the
unclass function is just the data and bears no relationship whatsoever
with the order of the m and f below it. I had gotten the idea that
it picked up on the first value and so displayed the label to match. I
don't even want to think about how much time I spent working on a
problem that was nonexistent!

Thank you very much for your help!

Bob

 unclass(mydata$gR)
[1] 2 2 2 2 1 1 1 1
attr(,levels)
[1] m f


=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=


-Original Message-
From: Chris Andrews [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 18, 2006 12:05 PM
To: Muenchen, Robert A (Bob)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Switching labels on a factor


Bob,

This is I think exactly what one wants to have happen.  The first four
observations are still women.  Both the labels and the underlying
integers should change.  (If you want to give all the people sex
changes, try Relevel in the Epi package.

mydata$afterthechange - Relevel(mydata$gender, list(m=f, f=m))
mydata


  workshop gender q1 q2 q3 q4 gR afterthechange
11  f  1  1  5  1  f  m
22  f  2  1  4  1  f  m
31  f  2  2  4  3  f  m
42  f  3  1 NA  3  f  m
51  m  4  5  2  4  m  f
62  m  5  4  5  5  m  f
71  m  5  3  4  4  m  f
82  m  4  5  5 NA  m  f


unclass(mydata$afterthechange)
[1] 1 1 1 1 2 2 2 2
attr(,levels)
[1] m f

Chris


Date: Fri, 15 Dec 2006 15:34:15 -0500
From: Muenchen, Robert A (Bob) [EMAIL PROTECTED]
Subject: [R] Switching labels on a factor
To: R-help@stat.math.ethz.ch
Message-ID:

[EMAIL PROTECTED]
Content-Type: text/plain;   charset=US-ASCII

Hi All,

I'm perplexed by the way the unclass function displays a factor whose
labels have been swapped with the relevel function. I realize it won't
affect any results and that the relevel did nothing useful in this
particular case. I'm just doing it to learn ways to manipulate factors.
The display of unclass leaves me feeling that the relevel had failed.

I've checked three books  searched R-help, but found no mention of this
particular issue.  

The program below demonstrates the problem. Is this a bug, or is there a
reason for it to work this way?

Thanks,
Bob

mystring-
(id,workshop,gender,q1,q2,q3,q4
 1,1,f,1,1,5,1
 2,2,f,2,1,4,1
 3,1,f,2,2,4,3
 4,2,f,3,1, ,3
 5,1,m,4,5,2,4
 6,2,m,5,4,5,5
 7,1,m,5,3,4,4
 8,2,m,4,5,5,9)
mydata-read.table(textConnection(mystring),
   header=TRUE,sep=,,row.names=id,na.strings=9)
mydata

# Create a gender Releveled variable, gR. 
# Now 1=m, 2=f
mydata$gR - relevel(mydata$gender, m)

# Print the data to show that the labels of gR match those of gender.
mydata

# Show that the underlying codes have indeed reversed.
as.numeric(mydata$gender)
as.numeric(mydata$gR)

# Unclass the two variables to see that print order 
# implies that both the codes and labels have
# flipped, cancelling each other out. For gR,
# m appears to be associated with 2, and f with 1
unclass(mydata$gender)
unclass(mydata$gR)

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html




-- 
Christopher Andrews, PhD
SUNY Buffalo, Department of Biostatistics
242 Farber Hall, [EMAIL PROTECTED], 716 829 2756

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Applying variable labels across a data frame

2006-12-18 Thread Muenchen, Robert A (Bob)
Hi All,

I'm working on a class example that demonstrates one way to deal with
factors and their labels. I create a function called myLabeler and apply
it with lapply. It works on the whole data frame when I subscript it as
in lapply( myQFvars[ ,myQFnames ], myLabeler ) but does not work if I
leave the [] subscripts off. I would appreciate it if anyone could tell
me why. The program below works up until the final two statements.

Thanks,
Bob


# Assigning factor labels to potentially lots of vars.

mystring-
(id,workshop,gender,q1,q2,q3,q4
 1,1,f,1,1,5,1
 2,2,f,2,1,4,1
 3,1,f,2,2,4,3
 4,2,f,3,1, ,3
 5,1,m,4,5,2,4
 6,2,m,5,4,5,5
 7,1,m,5,3,4,4
 8,2,m,4,5,5,9)

mydata-read.table(textConnection(mystring),
   header=TRUE,sep=,,row.names=id,na.strings=9)
print(mydata)

# Create copies of q variables to use as factors
# so we can count them.
myQlevels - c(1,2,3,4,5)
myQlabels - c(Strongly Disagree,
   Disagree,
   Neutral,
   Agree,
   Strongly Agree)
print(myQlevels)
print(myQlabels)

# Generate two sets of var names to use.
myQnames  -   paste( q,  1:4, sep=)
myQFnames - paste( qf, 1:4, sep=)
print(myQnames) #The original names.
print(myQFnames)  #The names for new factor variables.

# Extract the q variables to a separate data frame.
myQFvars - mydata[ ,myQnames]
print(myQFvars)

# Rename all the variables with F for Factor.
colnames(myQFvars) - myQFnames
print(myQFvars)

# Create a function to apply the labels to lots of variables.
myLabeler - function(x) { factor(x, myQlevels, myQlabels) }

# Here's how to use the function on one variable.
summary( myLabeler(myQFvars[qf1]) )

#Apply it to all the variables. This method works.
myQFvars[ ,myQFnames] - lapply( myQFvars[ ,myQFnames ], myLabeler )
summary(myQFvars) #Here are the results I wanted.

# This is the same as above but using the unsubscripted
# data frame name. It does not work.
myTest - lapply( myQFvars, myLabeler )
summary(myTest) #I'm not sure what these results are.

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Switching labels on a factor

2006-12-15 Thread Muenchen, Robert A (Bob)
Hi All,

I'm perplexed by the way the unclass function displays a factor whose
labels have been swapped with the relevel function. I realize it won't
affect any results and that the relevel did nothing useful in this
particular case. I'm just doing it to learn ways to manipulate factors.
The display of unclass leaves me feeling that the relevel had failed.

I've checked three books  searched R-help, but found no mention of this
particular issue.  

The program below demonstrates the problem. Is this a bug, or is there a
reason for it to work this way?

Thanks,
Bob

mystring-
(id,workshop,gender,q1,q2,q3,q4
 1,1,f,1,1,5,1
 2,2,f,2,1,4,1
 3,1,f,2,2,4,3
 4,2,f,3,1, ,3
 5,1,m,4,5,2,4
 6,2,m,5,4,5,5
 7,1,m,5,3,4,4
 8,2,m,4,5,5,9)
mydata-read.table(textConnection(mystring),
   header=TRUE,sep=,,row.names=id,na.strings=9)
mydata

# Create a gender Releveled variable, gR. 
# Now 1=m, 2=f
mydata$gR - relevel(mydata$gender, m)

# Print the data to show that the labels of gR match those of gender.
mydata

# Show that the underlying codes have indeed reversed.
as.numeric(mydata$gender)
as.numeric(mydata$gR)

# Unclass the two variables to see that print order 
# implies that both the codes and labels have
# flipped, cancelling each other out. For gR,
# m appears to be associated with 2, and f with 1
unclass(mydata$gender)
unclass(mydata$gR)

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Conditional Tranformations

2006-11-25 Thread Muenchen, Robert A (Bob)
Gabor,

Those are handy variations! Perhaps my brain in still in SAS mode on
this. I'm expecting something like the code below that checks for male
only once, checks for female only when not male (skipping NAs) and does
all formulas under the appropriate conditions. The formulas I made up to
keep the code short  may not be as easily modified to let the logical
0/1 values fix them.

if gender==m then do;
  Score1=...
  Score2=
  ...
end;
else if gender==f then do;
  Score1=...
  Score2=
  ...
end;

R may not have anything quite like that. R certainly has many other
features that SAS lacks.

Thanks,
Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=


-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: Saturday, November 25, 2006 12:39 AM
To: Muenchen, Robert A (Bob)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Multiple Conditional Tranformations

And here is a variation:

transform(mydata,
   score1 = (2 + (gender == m)) * q1 + q2,
   score2 = score1 + 0.5 * q1
)

or

transform(
   transform(mydata, score1 = (2 + (gender == m)) * q1 + q2),
   score2 = score1 + 0.5 * q1
)


On 11/25/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Try this:


 transform(mydata,
   score1 = (2   + (gender == m)) * q1 + q2,
   score2 = (2.5 + (gender == m)) * q1 + q2
 )


 On 11/24/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
  Mark,
 
  I finally got that approach to work by spreading the logical
condition
  everywhere. That gets the lengths to match. Still, I can't help but
  think there must be a way to specify the logic once per condition.
 
  Thanks,
  Bob
 
  mydata$score1-numeric(mydata$q1) #just initializing.
  mydata$score2-numeric(mydata$q1)
  mydata$score1-NA
  mydata$score2-NA
  mydata
 
  mydata$score1[mydata$gender == f]-
2*mydata$q1[mydata$gender==f] +
 
   mydata$q2[mydata$gender==f]
  mydata$score2[mydata$gender ==
f]-2.5*mydata$q1[mydata$gender==f] +
 
   mydata$q2[mydata$gender==f]
  mydata$score1[mydata$gender == m]-3*mydata$q1[mydata$gender==m]
+
   mydata$q2[mydata$gender==m]
  mydata$score2[mydata$gender ==
m]-3.5*mydata$q1[mydata$gender==m] +
 
   mydata$q2[mydata$gender==m]
  mydata
 
  =
  Bob Muenchen (pronounced Min'-chen), Manager
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230
  FAX: (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web: http://oit.utk.edu/scc,
  News: http://listserv.utk.edu/archives/statnews.html
  =
 
 
  -Original Message-
  From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED]
  Sent: Friday, November 24, 2006 8:45 PM
  To: Muenchen, Robert A (Bob)
  Subject: RE: [R] Multiple Conditional Tranformations
 
  I'm not sure if I understand your question but I don't think you
need
  iflelse statements.
 
  myscore-numeric(q1) ( because I'm not sure how to initialize a list
so
  initialize a vector with q1 elements )
 
  myscore-NA ( I think this should set all the values in myscore to
NA )
  myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2
  myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2
 
  the above should do what you do in the first part of your code but I
  don't know if that was your question ?
  also, it does it making myscore a vector because I didn't know how
to
  initialize a list.
  Someone else may goive a better solution. I'm no expert.
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Muenchen,
Robert
  A (Bob)
  Sent: Friday, November 24, 2006 8:27 PM
  To: r-help@stat.math.ethz.ch
  Subject: [R] Multiple Conditional Tranformations
 
  Greetings,
 
 
 
  I'm learning R and I'm stuck on a basic concept: how to specify a
  logical condition once and then perform multiple transformations
under
  that condition. The program below is simplified to demonstrate the
goal.
  Its results are exactly what I want, but I would like to check the
  logical state of gender only once and create both (or any number of)
  scores at once.
 
 
 
  mystring-
 
  (id,group,gender,q1,q2,q3,q4
 
  01,1,f,2,2,5,4
 
  02,2,f,2,1,4,5
 
  03,1,f,2,2,4,4
 
  04,2,f,1,1,5,5
 
  05,1,m,4,5,4,
 
  06,2,m,5,4,5,5
 
  07,1,m,3,3,4,5
 
  08,2,m,5,5,5,4)
 
 
 
 
mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name
  s=id)
 
  mydata
 
 
 
  #Create score1 so that it differs for males and females:
 
  mydata$score1 - ifelse( mydata$gender==f ,
 
(mydata$score1 - (2*mydata$q1)+mydata

Re: [R] Multiple Conditional Tranformations

2006-11-25 Thread Muenchen, Robert A (Bob)
That's exactly what I'm looking for. Thanks so much for taking the time
to do it that way. 

On the redundancy issue, I think SAS checks the else if condition only
if the original if is false. The check for f when not m I put in only
to exclude missing values for gender.

Thanks!!
Bob

-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: Saturday, November 25, 2006 7:37 AM
To: Muenchen, Robert A (Bob)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Multiple Conditional Tranformations

Firstly your outline does not check once, it checks twice.  First it
check for m and then it redundantly checks for f.  On the other
hand the two variations in my post do check once.

Although substantially longer than the solutions in my prior posts,
if you want the style shown in your post try this:

mydata2 - cbind(mydata, score1 = 0, score2 = 0)
is.m - mydata$gender == m

mydata2[is.m, ] - transform(mydata[is.m, ],
   score1 = 3 * q1 + q2,
   score2 = 3.5 * q1 + q2
)

mydata2[!is.m,] - transform(mydata2[!is.m, ],
   score1 = 2 * q1 + q2,
   score2 = 2.5 * q1 + q2
)

On 11/25/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
 Gabor,

 Those are handy variations! Perhaps my brain in still in SAS mode on
 this. I'm expecting something like the code below that checks for male
 only once, checks for female only when not male (skipping NAs) and
does
 all formulas under the appropriate conditions. The formulas I made up
to
 keep the code short  may not be as easily modified to let the logical
 0/1 values fix them.

 if gender==m then do;
  Score1=...
  Score2=
  ...
 end;
 else if gender==f then do;
  Score1=...
  Score2=
  ...
 end;

 R may not have anything quite like that. R certainly has many other
 features that SAS lacks.

 Thanks,
 Bob

 =
 Bob Muenchen (pronounced Min'-chen), Manager
 Statistical Consulting Center
 U of TN Office of Information Technology
 200 Stokely Management Center, Knoxville, TN 37996-0520
 Voice: (865) 974-5230
 FAX: (865) 974-4810
 Email: [EMAIL PROTECTED]
 Web: http://oit.utk.edu/scc,
 News: http://listserv.utk.edu/archives/statnews.html
 =


 -Original Message-
 From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
 Sent: Saturday, November 25, 2006 12:39 AM
 To: Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Multiple Conditional Tranformations

 And here is a variation:

 transform(mydata,
   score1 = (2 + (gender == m)) * q1 + q2,
   score2 = score1 + 0.5 * q1
 )

 or

 transform(
   transform(mydata, score1 = (2 + (gender == m)) * q1 + q2),
   score2 = score1 + 0.5 * q1
 )


 On 11/25/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Try this:
 
 
  transform(mydata,
score1 = (2   + (gender == m)) * q1 + q2,
score2 = (2.5 + (gender == m)) * q1 + q2
  )
 
 
  On 11/24/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
   Mark,
  
   I finally got that approach to work by spreading the logical
 condition
   everywhere. That gets the lengths to match. Still, I can't help
but
   think there must be a way to specify the logic once per condition.
  
   Thanks,
   Bob
  
   mydata$score1-numeric(mydata$q1) #just initializing.
   mydata$score2-numeric(mydata$q1)
   mydata$score1-NA
   mydata$score2-NA
   mydata
  
   mydata$score1[mydata$gender == f]-
 2*mydata$q1[mydata$gender==f] +
  
mydata$q2[mydata$gender==f]
   mydata$score2[mydata$gender ==
 f]-2.5*mydata$q1[mydata$gender==f] +
  
mydata$q2[mydata$gender==f]
   mydata$score1[mydata$gender ==
m]-3*mydata$q1[mydata$gender==m]
 +
mydata$q2[mydata$gender==m]
   mydata$score2[mydata$gender ==
 m]-3.5*mydata$q1[mydata$gender==m] +
  
mydata$q2[mydata$gender==m]
   mydata
  
   =
   Bob Muenchen (pronounced Min'-chen), Manager
   Statistical Consulting Center
   U of TN Office of Information Technology
   200 Stokely Management Center, Knoxville, TN 37996-0520
   Voice: (865) 974-5230
   FAX: (865) 974-4810
   Email: [EMAIL PROTECTED]
   Web: http://oit.utk.edu/scc,
   News: http://listserv.utk.edu/archives/statnews.html
   =
  
  
   -Original Message-
   From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED]
   Sent: Friday, November 24, 2006 8:45 PM
   To: Muenchen, Robert A (Bob)
   Subject: RE: [R] Multiple Conditional Tranformations
  
   I'm not sure if I understand your question but I don't think you
 need
   iflelse statements.
  
   myscore-numeric(q1) ( because I'm not sure how to initialize a
list
 so
   initialize a vector with q1 elements )
  
   myscore-NA ( I think this should set all the values in myscore to
 NA )
   myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2
   myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2
  
   the above should do what you do in the first part of your code but
I
   don't know

Re: [R] Multiple Conditional Tranformations

2006-11-25 Thread Muenchen, Robert A (Bob)
I have a program that is similar to your longer version, but I could
never get the syntax quite right. This will be a big help in
understanding how by works with functions.

Thanks,
Bob

-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: Saturday, November 25, 2006 11:11 AM
To: Muenchen, Robert A (Bob)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Multiple Conditional Tranformations

Here is a correction:

do.call(rbind, by(mydata, 1:nrow(mydata), function(x)
  switch(as.character(x$gender),
 m = transform(x, score1 = 3*q1+q2, score2 = 3.5*q1+q2),
 f = transform(x, score1 = 2*q1+q2, score2 = 2.5*q1+q2),
 transform(x, score1 = NA, score2 = NA))
))

On 11/25/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Here are some additional solutions.  It appears that the SAS code is
performing
 the transformation row by row and for each row the code in your post
is
 specifying the transformation so if you want to do it that way we
 could use 'by'
 like this (where this time we have also added NA processing for the
gender):


 do.call(rbind, by(mydata, 1:nrow(mydata), function(x)
   switch(as.character(x$gender),
  m = transform(x, score1 = 3*q1+q2, score2 = 3.5*q1+q2),
  f = transform(x, score1 = 2*q1+q2, score2 = 2.5*q1+q2),
  NA)
 ))

 # or this somewhat longer version:

 do.call(rbind, by(mydata, 1:nrow(mydata), function(x) with(x, {
  if (is.na(gender)) {
  score1 - score2 - NA
  } else if (gender == m) {
 score1 = 3 * q1 + q2
 score2 = 3.5 * q1 + q2
  } else if (gender == f) {
 score1 = 2 * q1 + q2
 score2 = 2.5 * q1 + q2
  }
  cbind(x, score1, score2)
 })))







 On 11/25/06, Muehnchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
  That's exactly what I'm looking for. Thanks so much for taking the
time
  to do it that way.
 
  On the redundancy issue, I think SAS checks the else if condition
only
  if the original if is false. The check for f when not m I put in
only
  to exclude missing values for gender.
 
  Thanks!!
  Bob
 
  -Original Message-
  From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
  Sent: Saturday, November 25, 2006 7:37 AM
  To: Muenchen, Robert A (Bob)
  Cc: r-help@stat.math.ethz.ch
  Subject: Re: [R] Multiple Conditional Tranformations
 
  Firstly your outline does not check once, it checks twice.  First it
  check for m and then it redundantly checks for f.  On the other
  hand the two variations in my post do check once.
 
  Although substantially longer than the solutions in my prior posts,
  if you want the style shown in your post try this:
 
  mydata2 - cbind(mydata, score1 = 0, score2 = 0)
  is.m - mydata$gender == m
 
  mydata2[is.m, ] - transform(mydata[is.m, ],
score1 = 3 * q1 + q2,
score2 = 3.5 * q1 + q2
  )
 
  mydata2[!is.m,] - transform(mydata2[!is.m, ],
score1 = 2 * q1 + q2,
score2 = 2.5 * q1 + q2
  )
 
  On 11/25/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
   Gabor,
  
   Those are handy variations! Perhaps my brain in still in SAS mode
on
   this. I'm expecting something like the code below that checks for
male
   only once, checks for female only when not male (skipping NAs) and
  does
   all formulas under the appropriate conditions. The formulas I made
up
  to
   keep the code short  may not be as easily modified to let the
logical
   0/1 values fix them.
  
   if gender==m then do;
Score1=...
Score2=
...
   end;
   else if gender==f then do;
Score1=...
Score2=
...
   end;
  
   R may not have anything quite like that. R certainly has many
other
   features that SAS lacks.
  
   Thanks,
   Bob
  
   =
   Bob Muenchen (pronounced Min'-chen), Manager
   Statistical Consulting Center
   U of TN Office of Information Technology
   200 Stokely Management Center, Knoxville, TN 37996-0520
   Voice: (865) 974-5230
   FAX: (865) 974-4810
   Email: [EMAIL PROTECTED]
   Web: http://oit.utk.edu/scc,
   News: http://listserv.utk.edu/archives/statnews.html
   =
  
  
   -Original Message-
   From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
   Sent: Saturday, November 25, 2006 12:39 AM
   To: Muenchen, Robert A (Bob)
   Cc: r-help@stat.math.ethz.ch
   Subject: Re: [R] Multiple Conditional Tranformations
  
   And here is a variation:
  
   transform(mydata,
 score1 = (2 + (gender == m)) * q1 + q2,
 score2 = score1 + 0.5 * q1
   )
  
   or
  
   transform(
 transform(mydata, score1 = (2 + (gender == m)) * q1 + q2),
 score2 = score1 + 0.5 * q1
   )
  
  
   On 11/25/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
Try this:
   
   
transform(mydata,
  score1 = (2   + (gender == m)) * q1 + q2,
  score2 = (2.5 + (gender == m)) * q1 + q2
)
   
   
On 11/24/06, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
 Mark,

 I finally got that approach

[R] Multiple Conditional Tranformations

2006-11-24 Thread Muenchen, Robert A (Bob)
Greetings,

 

I'm learning R and I'm stuck on a basic concept: how to specify a
logical condition once and then perform multiple transformations under
that condition. The program below is simplified to demonstrate the goal.
Its results are exactly what I want, but I would like to check the
logical state of gender only once and create both (or any number of)
scores at once.

 

mystring-

(id,group,gender,q1,q2,q3,q4

01,1,f,2,2,5,4

02,2,f,2,1,4,5

03,1,f,2,2,4,4

04,2,f,1,1,5,5

05,1,m,4,5,4,

06,2,m,5,4,5,5

07,1,m,3,3,4,5

08,2,m,5,5,5,4)

 

mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name
s=id)

mydata 

 

#Create score1 so that it differs for males and females:

mydata$score1 - ifelse( mydata$gender==f , 

   (mydata$score1 - (2*mydata$q1)+mydata$q2),

   ifelse( mydata$gender==m,

  (mydata$score1 - (3*mydata$q1)+mydata$q2), NA )

   )

mydata

 

#Create score2 so that it too differs for males and females:

mydata$score2 - ifelse( mydata$gender==f , 

   (mydata$score2 - (2.5*mydata$q1)+mydata$q2),

   ifelse( mydata$gender==m,

  (mydata$score2 - (3.5*mydata$q1)+mydata$q2), NA )

   )

mydata

 

 

Thanks!

Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc http://oit.utk.edu/scc , 
News: http://listserv.utk.edu/archives/statnews.html
http://listserv.utk.edu/archives/statnews.html 
=

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Conditional Tranformations

2006-11-24 Thread Muenchen, Robert A (Bob)
Mark,

Here's what I get when I try that approach.

Thanks,
Bob

 mydata$score1-numeric(mydata$q1) #just initializing.
 mydata$score2-numeric(mydata$q1)
 mydata$score1-NA
 mydata$score2-NA
 mydata
  group gender q1 q2 q3 q4 score1 score2
1 1  f  2  2  5  4 NA NA
2 2  f  2  1  4  5 NA NA
3 1  f  2  2  4  4 NA NA
4 2  f  1  1  5  5 NA NA
5 1  m  4  5  4 NA NA NA
6 2  m  5  4  5  5 NA NA
7 1  m  3  3  4  5 NA NA
8 2  m  5  5  5  4 NA NA
 mydata$score1[mydata$gender == f]-2*mydata$q1 + mydata$q2
Warning message:
number of items to replace is not a multiple of replacement length 
 mydata$score2[mydata$gender == f]-2.5*mydata$q1 + mydata$q2
Warning message:
number of items to replace is not a multiple of replacement length 
 mydata$score1[mydata$gender == m]-3*mydata$q1 + mydata$q2
Warning message:
number of items to replace is not a multiple of replacement length 
 mydata$score2[mydata$gender == m]-3.5*mydata$q1 + mydata$q2
Warning message:
number of items to replace is not a multiple of replacement length 


-Original Message-
From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 24, 2006 8:45 PM
To: Muenchen, Robert A (Bob)
Subject: RE: [R] Multiple Conditional Tranformations

I'm not sure if I understand your question but I don't think you need
iflelse statements.

myscore-numeric(q1) ( because I'm not sure how to initialize a list so
initialize a vector with q1 elements )

myscore-NA ( I think this should set all the values in myscore to NA )
myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2
myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2

the above should do what you do in the first part of your code but I
don't know if that was your question ?
also, it does it making myscore a vector because I didn't know how to
initialize a list.
Someone else may goive a better solution. I'm no expert.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert
A (Bob)
Sent: Friday, November 24, 2006 8:27 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Multiple Conditional Tranformations

Greetings,

 

I'm learning R and I'm stuck on a basic concept: how to specify a
logical condition once and then perform multiple transformations under
that condition. The program below is simplified to demonstrate the goal.
Its results are exactly what I want, but I would like to check the
logical state of gender only once and create both (or any number of)
scores at once.

 

mystring-

(id,group,gender,q1,q2,q3,q4

01,1,f,2,2,5,4

02,2,f,2,1,4,5

03,1,f,2,2,4,4

04,2,f,1,1,5,5

05,1,m,4,5,4,

06,2,m,5,4,5,5

07,1,m,3,3,4,5

08,2,m,5,5,5,4)

 

mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name
s=id)

mydata 

 

#Create score1 so that it differs for males and females:

mydata$score1 - ifelse( mydata$gender==f , 

   (mydata$score1 - (2*mydata$q1)+mydata$q2),

   ifelse( mydata$gender==m,

  (mydata$score1 - (3*mydata$q1)+mydata$q2), NA )

   )

mydata

 

#Create score2 so that it too differs for males and females:

mydata$score2 - ifelse( mydata$gender==f , 

   (mydata$score2 - (2.5*mydata$q1)+mydata$q2),

   ifelse( mydata$gender==m,

  (mydata$score2 - (3.5*mydata$q1)+mydata$q2), NA )

   )

mydata

 

 

Thanks!

Bob

=
Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting
Center U of TN Office of Information Technology 200 Stokely Management
Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc http://oit.utk.edu/scc ,
News: http://listserv.utk.edu/archives/statnews.html
http://listserv.utk.edu/archives/statnews.html
=

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This is not an offer (or solicitation of an offer) to buy/sell the
securities/instruments mentioned or an official confirmation.  Morgan
Stanley may deal as principal in or own or act as market maker for
securities/instruments mentioned or may advise the issuers.  This is not
research and is not from MS Research but it may refer to a research
analyst/research report.  Unless indicated, these views are the author's
and may differ from those of Morgan Stanley research or others in the
Firm.  We do not represent this is accurate or complete and we may not
update this.  Past performance is not indicative of future returns.  For
additional information, research reports and important disclosures,
contact me or see

Re: [R] Multiple Conditional Tranformations

2006-11-24 Thread Muenchen, Robert A (Bob)
Mark,

I finally got that approach to work by spreading the logical condition
everywhere. That gets the lengths to match. Still, I can't help but
think there must be a way to specify the logic once per condition.

Thanks,
Bob

mydata$score1-numeric(mydata$q1) #just initializing.
mydata$score2-numeric(mydata$q1)
mydata$score1-NA
mydata$score2-NA
mydata

mydata$score1[mydata$gender == f]-  2*mydata$q1[mydata$gender==f] +

  mydata$q2[mydata$gender==f]
mydata$score2[mydata$gender == f]-2.5*mydata$q1[mydata$gender==f] +

  mydata$q2[mydata$gender==f]
mydata$score1[mydata$gender == m]-3*mydata$q1[mydata$gender==m] + 
  mydata$q2[mydata$gender==m]
mydata$score2[mydata$gender == m]-3.5*mydata$q1[mydata$gender==m] +

  mydata$q2[mydata$gender==m]
mydata

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=


-Original Message-
From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 24, 2006 8:45 PM
To: Muenchen, Robert A (Bob)
Subject: RE: [R] Multiple Conditional Tranformations

I'm not sure if I understand your question but I don't think you need
iflelse statements.

myscore-numeric(q1) ( because I'm not sure how to initialize a list so
initialize a vector with q1 elements )

myscore-NA ( I think this should set all the values in myscore to NA )
myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2
myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2

the above should do what you do in the first part of your code but I
don't know if that was your question ?
also, it does it making myscore a vector because I didn't know how to
initialize a list.
Someone else may goive a better solution. I'm no expert.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert
A (Bob)
Sent: Friday, November 24, 2006 8:27 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Multiple Conditional Tranformations

Greetings,

 

I'm learning R and I'm stuck on a basic concept: how to specify a
logical condition once and then perform multiple transformations under
that condition. The program below is simplified to demonstrate the goal.
Its results are exactly what I want, but I would like to check the
logical state of gender only once and create both (or any number of)
scores at once.

 

mystring-

(id,group,gender,q1,q2,q3,q4

01,1,f,2,2,5,4

02,2,f,2,1,4,5

03,1,f,2,2,4,4

04,2,f,1,1,5,5

05,1,m,4,5,4,

06,2,m,5,4,5,5

07,1,m,3,3,4,5

08,2,m,5,5,5,4)

 

mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name
s=id)

mydata 

 

#Create score1 so that it differs for males and females:

mydata$score1 - ifelse( mydata$gender==f , 

   (mydata$score1 - (2*mydata$q1)+mydata$q2),

   ifelse( mydata$gender==m,

  (mydata$score1 - (3*mydata$q1)+mydata$q2), NA )

   )

mydata

 

#Create score2 so that it too differs for males and females:

mydata$score2 - ifelse( mydata$gender==f , 

   (mydata$score2 - (2.5*mydata$q1)+mydata$q2),

   ifelse( mydata$gender==m,

  (mydata$score2 - (3.5*mydata$q1)+mydata$q2), NA )

   )

mydata

 

 

Thanks!

Bob

=
Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting
Center U of TN Office of Information Technology 200 Stokely Management
Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc http://oit.utk.edu/scc ,
News: http://listserv.utk.edu/archives/statnews.html
http://listserv.utk.edu/archives/statnews.html
=

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This is not an offer (or solicitation of an offer) to buy/sell the
securities/instruments mentioned or an official confirmation.  Morgan
Stanley may deal as principal in or own or act as market maker for
securities/instruments mentioned or may advise the issuers.  This is not
research and is not from MS Research but it may refer to a research
analyst/research report.  Unless indicated, these views are the author's
and may differ from those of Morgan Stanley research or others in the
Firm.  We do not represent this is accurate or complete and we may not
update this.  Past performance is not indicative of future returns.  For
additional information, research

Re: [R] Multiple Conditional Tranformations

2006-11-24 Thread Muenchen, Robert A (Bob)
Good idea. I'm still getting used to how flexible R is on substitutions
like that! -Bob


-Original Message-
From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 24, 2006 10:20 PM
To: Muenchen, Robert A (Bob)
Subject: RE: [R] Multiple Conditional Tranformations

You could set temp-which(my$gender[my$gender == f]) and then temp
will have the female indices and
Then you could just put temp everywhere instead of the statement but I
think that's the best you can do.
Definitely, someone will reply and there may be a shorter way that I am
unaware of.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert
A (Bob)
Sent: Friday, November 24, 2006 10:09 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Multiple Conditional Tranformations

Mark,

I finally got that approach to work by spreading the logical condition
everywhere. That gets the lengths to match. Still, I can't help but
think there must be a way to specify the logic once per condition.

Thanks,
Bob

mydata$score1-numeric(mydata$q1) #just initializing.
mydata$score2-numeric(mydata$q1)
mydata$score1-NA
mydata$score2-NA
mydata

mydata$score1[mydata$gender == f]-  2*mydata$q1[mydata$gender==f] +

  mydata$q2[mydata$gender==f]
mydata$score2[mydata$gender == f]-2.5*mydata$q1[mydata$gender==f] +

  mydata$q2[mydata$gender==f]
mydata$score1[mydata$gender == m]-3*mydata$q1[mydata$gender==m] +
  mydata$q2[mydata$gender==m]
mydata$score2[mydata$gender == m]-3.5*mydata$q1[mydata$gender==m] +

  mydata$q2[mydata$gender==m]
mydata

=
Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting
Center U of TN Office of Information Technology 200 Stokely Management
Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc,
News: http://listserv.utk.edu/archives/statnews.html
=


-Original Message-
From: Leeds, Mark (IED) [mailto:[EMAIL PROTECTED]
Sent: Friday, November 24, 2006 8:45 PM
To: Muenchen, Robert A (Bob)
Subject: RE: [R] Multiple Conditional Tranformations

I'm not sure if I understand your question but I don't think you need
iflelse statements.

myscore-numeric(q1) ( because I'm not sure how to initialize a list so
initialize a vector with q1 elements )

myscore-NA ( I think this should set all the values in myscore to NA )
myscore[mydata$gender == f]-2*mydata$q1 + mydata$q2
myscore[mydata$gender == m]-3*mydata$q1 + mydata$q2

the above should do what you do in the first part of your code but I
don't know if that was your question ?
also, it does it making myscore a vector because I didn't know how to
initialize a list.
Someone else may goive a better solution. I'm no expert.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert
A (Bob)
Sent: Friday, November 24, 2006 8:27 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Multiple Conditional Tranformations

Greetings,

 

I'm learning R and I'm stuck on a basic concept: how to specify a
logical condition once and then perform multiple transformations under
that condition. The program below is simplified to demonstrate the goal.
Its results are exactly what I want, but I would like to check the
logical state of gender only once and create both (or any number of)
scores at once.

 

mystring-

(id,group,gender,q1,q2,q3,q4

01,1,f,2,2,5,4

02,2,f,2,1,4,5

03,1,f,2,2,4,4

04,2,f,1,1,5,5

05,1,m,4,5,4,

06,2,m,5,4,5,5

07,1,m,3,3,4,5

08,2,m,5,5,5,4)

 

mydata-read.table(textConnection(mystring),header=TRUE,sep=,,row.name
s=id)

mydata 

 

#Create score1 so that it differs for males and females:

mydata$score1 - ifelse( mydata$gender==f , 

   (mydata$score1 - (2*mydata$q1)+mydata$q2),

   ifelse( mydata$gender==m,

  (mydata$score1 - (3*mydata$q1)+mydata$q2), NA )

   )

mydata

 

#Create score2 so that it too differs for males and females:

mydata$score2 - ifelse( mydata$gender==f , 

   (mydata$score2 - (2.5*mydata$q1)+mydata$q2),

   ifelse( mydata$gender==m,

  (mydata$score2 - (3.5*mydata$q1)+mydata$q2), NA )

   )

mydata

 

 

Thanks!

Bob

=
Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting
Center U of TN Office of Information Technology 200 Stokely Management
Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc http://oit.utk.edu/scc ,
News: http://listserv.utk.edu/archives/statnews.html
http://listserv.utk.edu/archives/statnews.html
=

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R