[R] boxcox transformation

2006-07-29 Thread Torsten Mathies
I've got a vector of data (hours to drive from a to b) y. 
 
After a qqplot I know, that they don't fit the normal probability. 
 
I would like to transform these data with the boxcox transformation
(MASS), that they fit the model.
 
When I try
 
ybx-boxcox(y~1,0) 
qqnorm(ybx)
 
the plot is different from
 
library (TeachingDemos)
ybct-bct(y,0) //
qqnorm(ybct)
 
How can I transform y to fit with the normal probability model?
 
 
Yours
Torsten

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2006-07-29 Thread John Morrow
Dear R-Helpers, 

I have a large data matrix (9707 rows, 60 columns), which contains missing
data. The matrix looks something like this: 

1) X X X X X X  NA  X X X X X X X X X 

2) NA NA NA NA X NA NA NA X NA NA 

3) NA NA X NA NA NA NA NA NA NA 

5) NA X NA X X X NA X X X X NA X 

..

9708) X NA NA X NA NA X X NA NA X

.and so on. Notice that every row has a varying number of entries, all rows
have at least one entry, but some rows have too much missing data.  My goal
is to filter out/remove rows that have ~5 (this number is yet to be
determined, but let's say its 5) missing entries before I run pearsons to
tell me correlation between all of the rows.  The order of the columns does
not matter here.
I think that I might need to test each row for a data, at least one NA,
data pattern?

Is there some kind of way of doing this? I am at a loss for an easy way to
accomplishing this. Any suggestions are most appreciated! 

John Morrow

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Version 0.4.3 of odfWeave

2006-07-29 Thread Kuhn, Max
A new version of odfWeave is on CRAN. Changes include:

 - handling of locales. Errors were being produced when locales were set
to anything but C. The fix changes the locale to C and changes back
to the original locale when the user's code is executed.

 - a bug fix for default plot device units (to prevent bitmap graphics
from being 480x480 inches on non-Windows systems).

 - a check was inserted at the beginning of the odfWeave function to
test for zipping/unzipping utilities. This is only done if the default
value for zipCmd in odfWeaveControl was not used. A more meaningful
error message is produced.

 - the manual is now a legal package vignette and accessible using
vignette(odfWeave).

Thanks to Manuel Morales, Ronggui, Andrew Robinson, Frank Harrell and
David Hajage for their help in finding bugs.

Please send any comments, questions, bugs etc.

Max Kuhn
Research Statistics
Pfizer Global RD
Max.Kuhn at pfizer.com
--
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-interpreted strings

2006-07-29 Thread Prof Brian Ripley
On Thu, 27 Jul 2006, [EMAIL PROTECTED] wrote:

 I am new to R, so please forgive me if there is an obvious answer to
 this question.  I have done fairly extensive searching through R docs,
 google and a few R users and have not found an answer to my question.
 
 Is there a way to create a non-interpreted string object in R?
 
 For example, I am using R in a MS Windows environment and I would like
 to paste DOS paths into some R command:
   setwd(c:\some\directory)
 
 Obviously this does not work because of the escaping mechanism.  And I
 know the obvious answer, use \\.  But if you do a lot of pasting into
 R it could get tedious manually editing escape sequences.

Why manually edit?  You can do this many ways, including with R (reading 
from a file or from a separate line).  E.g.

 setwd(readline(new dir: ))
new dir: c:\TEMP
 getwd()
[1] c:/TEMP

 I did find a workable solution to this particular problem:
   setwd(choose.dir())EnterPasteEnter
 This saves me from having to do the editing myself.  I can conceive of
 other examples of wanting to paste other more abstract stings into R
 that may happen to have a \ in it.  And now, thanks to choose.dir(), I
 have a way to do the translation automagically but...
 
 My question is, is there any way in R to not interpret the string and
 store the string as is?  For instance, Perl allows you to do interpreted
 ( ) and non-interpreted strings (' ').  This does not work in R; ' '
 acts just like   and my testing indicates that the interpretation is
 done at parse time.  Is there any language level construct for creating
 a non-interpreted string in R?

No.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[Please do, not send HTML code and BTW the answer is in the recent list 
archives.]
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fancier plotting

2006-07-29 Thread Fred J.
Hi

thank you for talking the time to help me with this.

I have a sequence of numbers in a file and an equal sequence of various 
character, say(a b c d) each occurs more than once. I need to plot the numbers 
so that numbers corresponding to a in the other sequence would have green dots, 
those corresponding to b a red dot, nothing on c and blue square for d. i.e

2 a  show a green dot
4 b  show a red dot
8 b  show a red dot
6 c  show default colour
2 d  show blue square

I have the code below which plots the data but I have no clue how to inject the 
extra fancies.


###
# ploting #
###
library(tkrplot)

#just the turning points 
L - length(I0); #points to plot

tt - tktoplevel()
left - tclVar(1)
oldleft - tclVar(1)
right - tclVar(L)
cury - tclVar(' ')
curx - NA
tmpusr - numeric(4)
tmpplt - numeric(4)

f1 - function(){
  lleft - as.numeric(tclvalue(left))
  rright - as.numeric(tclvalue(right))
  x - seq(lleft,rright,by=1)
  par(bg='black', fg='green', col='white', col.axis='white',
  col.lab='magenta', col.main='blue', col.sub='cyan')
  plot(x,I0[x], type='s')

##   par(new=TRUE)
##   plot(x,I1[x], type='s', col='yellow',axes=F)
  
  par(new=TRUE)
  plot(x,I2[x], type='s', col='cyan',axes=F)

  axis(4)
  tmpusr - par('usr')
  tmpplt - par('plt')

  if(!is.na(curx)){
abline(v=curx, col='red', lty=2)
abline(h=332, col='red', lty=2)
points(curx,I0[curx],pch=16,col='red')
  }

}

img - tkrplot(tt, f1,hscale=2,vscale=1.2)
tkconfigure(img, cursor='crosshair')

f2 - function(...){
ol - as.numeric(tclvalue(oldleft))
tclvalue(oldleft) - tclvalue(left)
r - as.numeric(tclvalue(right))
tclvalue(right) - as.character(r + as.numeric(...) - ol)
tkrreplot(img)
}

f3 - function(...){
tkrreplot(img)
}

f4 - function(...){
  i - 100
  ol - as.numeric(tclvalue(oldleft))
  tclvalue(left) - as.character(ol+i)
  tclvalue(oldleft) - as.character(ol+i)
  r - as.numeric(tclvalue(right))
  tclvalue(right) - as.character(r+i)
  tkrreplot(img)
}

iw - as.numeric(tcl('image','width', tkcget(img,'-image')))
ih - as.numeric(tcl('image','height',tkcget(img,'-image')))

mm - function(x,Y){
  tx - (as.numeric(x)-1)/iw
  ty - 1-(as.numeric(Y)-1)/ih
  
  if( tx  tmpplt[1]  tx  tmpplt[2] 
 ty  tmpplt[3]  ty  tmpplt[4] ){

newx - (tx-tmpplt[1])/(tmpplt[2]-tmpplt[1])*(tmpusr[2]-tmpusr[1])+tmpusr[1]
curx - round(newx)
tkrreplot(img)

newy - I0[curx]
newy - floor(newy) + (newy-floor(newy))*32/100

#newy2 - I1[curx]
newy3 - I2[curx]

tclvalue(cury) - paste('x =',curx,'  I0=',round(newy,2)
,'  I2=',round(newy3,4))
  }
} 

tkbind(img, 'Motion', mm)

l1 - tklabel(tt, textvariable=cury)

s1 - tkscale(tt, command=f2, from=1, to=length(I0),
variable=left, orient=horiz,label='left',length=700)
s2  - tkscale(tt, command=f3, from=1, to=length(I0),
variable=right, orient=horiz,label='right',length=700)
b1 - tkbutton(tt, text='-', command=f4)

tkpack(l1,img,s1,s2,b1)


 __



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting from a matrix w/o for-loop

2006-07-29 Thread Adrian DUSA
Hi,

On Friday 28 July 2006 20:21, Horace Tso wrote:
 Unless there is another level of complexity that i didn't see here,
 wouldn't it be a simply application of sapply as follow,

 sapply( 1:dim(M2)[[1]], function(x) M1[M2[x,1], M2[x,2]] )

Andy's previous answer involving matrix indexing (M1[M2]) is simpler but just 
for the sake of it, since we're dealing with matrices it is not a case of 
sapply but of _apply_:

apply(M2, 1, function(x) M1[x[1], x[2]])

My 2c,
Adrian

-- 
Adrian DUSA
Arhiva Romana de Date Sociale
Bd. Schitu Magureanu nr.1
050025 Bucuresti sectorul 5
Romania
Tel./Fax: +40 21 3126618 \
  +40 21 3120210 / int.101

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fancier plotting

2006-07-29 Thread jim holtman
Wasn't exactly sure what you wanted to do.  Is this close?

mypch - c(a=19, b=19, c=19, d=22) #point type
mycol - c(a='green', b='red', c='black', d='blue') #color
mydf - data.frame(x=c('a','b', 'b','c','d'), y=c(2, 4, 8, 6, 2))
plot(mydf$y, type='p', pch=mypch[mydf$x], col=mycol[mydf$x])


On 7/29/06, Fred J. [EMAIL PROTECTED] wrote:

 Hi

 thank you for talking the time to help me with this.

 I have a sequence of numbers in a file and an equal sequence of various
 character, say(a b c d) each occurs more than once. I need to plot the
 numbers so that numbers corresponding to a in the other sequence would have
 green dots, those corresponding to b a red dot, nothing on c and blue square
 for d. i.e

 2 a  show a green dot
 4 b  show a red dot
 8 b  show a red dot
 6 c  show default colour
 2 d  show blue square

 I have the code below which plots the data but I have no clue how to
 inject the extra fancies.

 
 ###
 # ploting #
 ###
 library(tkrplot)

 #just the turning points
 L - length(I0); #points to plot

 tt - tktoplevel()
 left - tclVar(1)
 oldleft - tclVar(1)
 right - tclVar(L)
 cury - tclVar(' ')
 curx - NA
 tmpusr - numeric(4)
 tmpplt - numeric(4)

 f1 - function(){
 lleft - as.numeric(tclvalue(left))
 rright - as.numeric(tclvalue(right))
 x - seq(lleft,rright,by=1)
 par(bg='black', fg='green', col='white', col.axis='white',
  col.lab='magenta', col.main='blue', col.sub='cyan')
 plot(x,I0[x], type='s')

 ##   par(new=TRUE)
 ##   plot(x,I1[x], type='s', col='yellow',axes=F)

 par(new=TRUE)
 plot(x,I2[x], type='s', col='cyan',axes=F)

 axis(4)
 tmpusr - par('usr')
 tmpplt - par('plt')

 if(!is.na(curx)){
abline(v=curx, col='red', lty=2)
abline(h=332, col='red', lty=2)
points(curx,I0[curx],pch=16,col='red')
 }

 }

 img - tkrplot(tt, f1,hscale=2,vscale=1.2)
 tkconfigure(img, cursor='crosshair')

 f2 - function(...){
ol - as.numeric(tclvalue(oldleft))
tclvalue(oldleft) - tclvalue(left)
r - as.numeric(tclvalue(right))
tclvalue(right) - as.character(r + as.numeric(...) - ol)
tkrreplot(img)
 }

 f3 - function(...){
tkrreplot(img)
 }

 f4 - function(...){
 i - 100
 ol - as.numeric(tclvalue(oldleft))
 tclvalue(left) - as.character(ol+i)
 tclvalue(oldleft) - as.character(ol+i)
 r - as.numeric(tclvalue(right))
 tclvalue(right) - as.character(r+i)
 tkrreplot(img)
 }

 iw - as.numeric(tcl('image','width', tkcget(img,'-image')))
 ih - as.numeric(tcl('image','height',tkcget(img,'-image')))

 mm - function(x,Y){
 tx - (as.numeric(x)-1)/iw
 ty - 1-(as.numeric(Y)-1)/ih

 if( tx  tmpplt[1]  tx  tmpplt[2] 
 ty  tmpplt[3]  ty  tmpplt[4] ){

newx -
 (tx-tmpplt[1])/(tmpplt[2]-tmpplt[1])*(tmpusr[2]-tmpusr[1])+tmpusr[1]
curx - round(newx)
tkrreplot(img)

newy - I0[curx]
newy - floor(newy) + (newy-floor(newy))*32/100

 #newy2 - I1[curx]
newy3 - I2[curx]

tclvalue(cury) - paste('x =',curx,'  I0=',round(newy,2)
,'  I2=',round(newy3,4))
 }
 }

 tkbind(img, 'Motion', mm)

 l1 - tklabel(tt, textvariable=cury)

 s1 - tkscale(tt, command=f2, from=1, to=length(I0),
variable=left, orient=horiz,label='left',length=700)
 s2  - tkscale(tt, command=f3, from=1, to=length(I0),
variable=right, orient=horiz,label='right',length=700)
 b1 - tkbutton(tt, text='-', command=f4)

 tkpack(l1,img,s1,s2,b1)


 __



[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2006-07-29 Thread jim holtman
Is this what you want?

 set.seed(1)
 x - matrix(sample(c(1, NA), 100, TRUE), nrow=10) # creat some data
 x
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]11   NA1   NA1   NA11 1
 [2,]111   NA   NA   NA1   NA   NA 1
 [3,]   NA   NA   NA1   NA1111NA
 [4,]   NA111   NA1111NA
 [5,]1   NA1   NA   NA1   NA1   NANA
 [6,]   NA11   NA   NA11   NA1NA
 [7,]   NA   NA1   NA111   NA   NA 1
 [8,]   NA   NA111   NA   NA11 1
 [9,]   NA1   NA   NA   NA   NA1   NA1NA
[10,]1   NA11   NA1   NA   NA1NA
 # count number of NAs per row
 numNAs - apply(x, 1, function(z) sum(is.na(z)))
 numNAs
 [1] 3 5 5 3 6 5 5 4 7 5
 # remove rows with more than 5 NAs
 x[!(numNAs  5),]
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]11   NA1   NA1   NA11 1
[2,]111   NA   NA   NA1   NA   NA 1
[3,]   NA   NA   NA1   NA1111NA
[4,]   NA111   NA1111NA
[5,]   NA11   NA   NA11   NA1NA
[6,]   NA   NA1   NA111   NA   NA 1
[7,]   NA   NA111   NA   NA11 1
[8,]1   NA11   NA1   NA   NA1NA




On 7/28/06, John Morrow [EMAIL PROTECTED] wrote:

 Dear R-Helpers,

 I have a large data matrix (9707 rows, 60 columns), which contains missing
 data. The matrix looks something like this:

 1) X X X X X X  NA  X X X X X X X X X

 2) NA NA NA NA X NA NA NA X NA NA

 3) NA NA X NA NA NA NA NA NA NA

 5) NA X NA X X X NA X X X X NA X

 ..

 9708) X NA NA X NA NA X X NA NA X

 .and so on. Notice that every row has a varying number of entries, all
 rows
 have at least one entry, but some rows have too much missing data.  My
 goal
 is to filter out/remove rows that have ~5 (this number is yet to be
 determined, but let's say its 5) missing entries before I run pearsons to
 tell me correlation between all of the rows.  The order of the columns
 does
 not matter here.
 I think that I might need to test each row for a data, at least one NA,
 data pattern?

 Is there some kind of way of doing this? I am at a loss for an easy way to
 accomplishing this. Any suggestions are most appreciated!

 John Morrow




[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problems when combining randomForests

2006-07-29 Thread Eleni Rapsomaniki
Hello again,

The reason why I thought the order at which rows are passed to randomForest
affect the error rate is because I get different results for different ways of
splitting my positive/negative data. 

First get the data (attached with this email)
pos.df=read.table(C:/Program Files/R/rw2011/pos.df, header=T)
neg.df=read.table(C:/Program Files/R/rw2011/neg.df, header=T)
library(randomForest)
#The first 2 columns are explanatory variables (which incidentally are not
discriminative at all if one looks at their distributions), the 3rd is the
class (pos or neg) 

train2test.ratio=8/10
min_len=min(nrow(pos.df), nrow(neg.df))
class_index=which(names(pos.df)==class) #is the same for neg.df
train_size=as.integer(min_len*train2test.ratio)

   Way 1
train.indicesP=sample(seq(1:nrow(pos.df)), size=train_size, replace=FALSE)
train.indicesN=sample(seq(1:nrow(neg.df)), size=train_size, replace=FALSE)

trainP=pos.df[train.indicesP,]
trainN=neg.df[train.indicesN,]
testP=pos.df[-train.indicesP,]
testN=neg.df[-train.indicesN,]

mydata.rf - randomForest(x=rbind(trainP, trainN)[,-class_index],
y=rbind(trainP, trainN)[,class_index], xtest=rbind(testP,
testN)[,-class_index], ytest=rbind(testP, testN)[,class_index],
importance=TRUE,proximity=FALSE, keep.forest=FALSE)
mydata.rf$test$confusion

##   Way 2
ind - sample(2, min(nrow(pos.df), nrow(neg.df)), replace = TRUE,
prob=c(train2test.ratio, (1-train2test.ratio)))
trainP=pos.df[ind==1,]
trainN=neg.df[ind==1,]
testP=pos.df[ind==2,]
testN=neg.df[ind==2,]

mydata.rf - randomForest(x=rbind(trainP, trainN)[,-dir_index], y=rbind(trainP,
trainN)[,dir_index], xtest=rbind(testP, testN)[,-dir_index], ytest=rbind(testP,
testN)[,dir_index], importance=TRUE,proximity=FALSE, keep.forest=FALSE)
mydata.rf$test$confusion

### Way 3
subset_start=1
subset_end=subset_start+train_size
train_index=seq(subset_start:subset_end)
trainP=pos.df[train_index,]
trainN=neg.df[train_index,]
testP=pos.df[-train_index,]
testN=neg.df[-train_index,]

mydata.rf - randomForest(x=rbind(trainP, trainN)[,-dir_index], y=rbind(trainP,
trainN)[,dir_index], xtest=rbind(testP, testN)[,-dir_index], ytest=rbind(testP,
testN)[,dir_index], importance=TRUE,proximity=FALSE, keep.forest=FALSE)
mydata.rf$test$confusion

### end

The first 2 methods give me an abnormally low error rate (compared to what I
get using the same data on a naiveBayes method) while the last one seems more
realistic, but the difference in error rates is very significant. I need to use
the last method to cross-validate subsets of my data sequentially(the first two
methods use random rows throughout the length of the data), unless there is a
better way to do it (?). Something must be very different between the first 2
methods and the last, but which is the correct one?

I would greatly appreciate any suggestions on this!

Many Thanks
Eleni Rapsomaniki

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] maximum likelihood

2006-07-29 Thread Ben Bolker

Alexandre Bonnet bonnet at gmail.com writes:

 
 *hi,*
 
 *using articial data, i'm supposed to estimate model*
 
 *y(t) = beta(1) + beta(2)*x(t) + u(t), u(t) = gamma*u(t-1) + v(t), t =
 1,...,100*
 
 *which is correctly specified in two ways: ML ommiting the first
 observation, and ML using all 100 observation.*
 

 [etc.]

  Alexandre,

  I think you're not getting any answers to this question
because it's a bit too vague (we have no context as to
_why_ you want to do this -- it also sounds a bit like
a homework question ...), and because it's stated much more
as a general statistical methodology question than as
an R question.  Please read the posting guide ...

  Doing this without specifying any parametric forms
would be tricky.  You may be able to do this by
searching for autoregressive model methods in RSiteSearch ...

  Ben Bolker

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2006-07-29 Thread Douglas Bates
On 7/29/06, jim holtman [EMAIL PROTECTED] wrote:
 Is this what you want?

  set.seed(1)
  x - matrix(sample(c(1, NA), 100, TRUE), nrow=10) # creat some data
  x
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
  [1,]11   NA1   NA1   NA11 1
  [2,]111   NA   NA   NA1   NA   NA 1
  [3,]   NA   NA   NA1   NA1111NA
  [4,]   NA111   NA1111NA
  [5,]1   NA1   NA   NA1   NA1   NANA
  [6,]   NA11   NA   NA11   NA1NA
  [7,]   NA   NA1   NA111   NA   NA 1
  [8,]   NA   NA111   NA   NA11 1
  [9,]   NA1   NA   NA   NA   NA1   NA1NA
 [10,]1   NA11   NA1   NA   NA1NA
  # count number of NAs per row
  numNAs - apply(x, 1, function(z) sum(is.na(z)))

It's a minor point but on a large matrix it would be better to use

numNAs - rowSums(is.na(z))

  numNAs
  [1] 3 5 5 3 6 5 5 4 7 5
  # remove rows with more than 5 NAs
  x[!(numNAs  5),]
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]11   NA1   NA1   NA11 1
 [2,]111   NA   NA   NA1   NA   NA 1
 [3,]   NA   NA   NA1   NA1111NA
 [4,]   NA111   NA1111NA
 [5,]   NA11   NA   NA11   NA1NA
 [6,]   NA   NA1   NA111   NA   NA 1
 [7,]   NA   NA111   NA   NA11 1
 [8,]1   NA11   NA1   NA   NA1NA
 



 On 7/28/06, John Morrow [EMAIL PROTECTED] wrote:
 
  Dear R-Helpers,
 
  I have a large data matrix (9707 rows, 60 columns), which contains missing
  data. The matrix looks something like this:
 
  1) X X X X X X  NA  X X X X X X X X X
 
  2) NA NA NA NA X NA NA NA X NA NA
 
  3) NA NA X NA NA NA NA NA NA NA
 
  5) NA X NA X X X NA X X X X NA X
 
  ..
 
  9708) X NA NA X NA NA X X NA NA X
 
  .and so on. Notice that every row has a varying number of entries, all
  rows
  have at least one entry, but some rows have too much missing data.  My
  goal
  is to filter out/remove rows that have ~5 (this number is yet to be
  determined, but let's say its 5) missing entries before I run pearsons to
  tell me correlation between all of the rows.  The order of the columns
  does
  not matter here.
  I think that I might need to test each row for a data, at least one NA,
  data pattern?
 
  Is there some kind of way of doing this? I am at a loss for an easy way to
  accomplishing this. Any suggestions are most appreciated!
 
  John Morrow
 
 
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] negative binomial lmer

2006-07-29 Thread Douglas Bates
On 7/28/06, Gregor Gorjanc [EMAIL PROTECTED] wrote:
 Ben Bolker bolker at ufl.edu writes:

 ...
   I haven't tried it, but you could also consider using
  a Poisson-lognormal (rather than neg binomial, which is Poisson-gamma)
  distribution, which might make this all work rather well
  in lmer:
 
  www.cefe.cnrs.fr/esp/TBElston_Parasitology2001.pdf

 Actually it is very simple

 lmer(y ~ effA + (1 | effB), family=quasipoisson)

 i.e. this fits the following model

 y_ijk ~ Poisson(\lambda_ijk)
 log(lambda_ijk) = \mu + effaA_i + effB_ij + e_ijk
 effB_i ~ Normal(0, \sigma^2_b)
 e_ijk ~ Normal(0, \sigma^2_e)

 Gregor

I would advise checking the results from lmer against those from
another way of fitting this model or the negative binomial model.
There may be a problem in the way that lmer handles the scale
parameter.  I haven't checked  generalized linear mixed models with a
scale parameter as extensively as I have checked those without a
separate scale parameter (the binomial and Poisson families).  If
anyone can provide me with an example of such a model and sample data
(preferably off-list) I would appreciate it.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random effects with lmer() and lme(), three random factors

2006-07-29 Thread Douglas Bates
On 7/28/06, Xianqun (Wilson) Wang [EMAIL PROTECTED] wrote:
 Hi, all,

 I have a question about random effects model. I am dealing with a
 three-factor experiment dataset. The response variable y is modeled
 against three factors: Samples, Operators, and Runs. The experimental
 design is as follow:

 4 samples were randomly chosen from a large pool of test samples. Each
 of the 4 samples was analyzed by 4 operators, randomly selected from a
 group of operators. Each operator independently analyzed same samples
 over 5 runs (runs nested in operator). I would like to know the
 following things:

 (1) the standard deviation within each run;

 (2) the standard deviation between runs;

 (3) the standard deviation within operator

 (4) the standard deviation between operator.

 With this data, I assumed the three factors are all random effects. So
 the model I am looking for is

 Model:  y  = Samples(random) + Operator(random) + Operator:Run(random) +
 Error(Operator) + Error(Operator:Run)  + Residuals

 I am using lme function in nlme package. Here is the R code I have

 1.   using lme:

 First I created a grouped data using

 gx - groupedData(y ~ 1 | Sample, data=x)

 gx$dummy - factor(rep(1,nrow(gx)))

 then I run the lme

 fm- lme(y ~ 1, data=gx,
 random=list(dummy=pdBlocked(list(pdIdent(~Sample-1),
 pdIdent(~Operator-1),
 pdIdent(~Operator:Run-1)

 finally, I use VarCorr to extract the variance components

vc - VarCorr(fm)
  Variance   StdDev
 Operator:Run 1.595713e-10:20   1.263215e-05:20
 Sample   5.035235e+00: 4   2.243933e+00: 4
 Operator 5.483145e-04: 4   2.341612e-02: 4
 Residuals8.543601e-02: 1   2.922944e-01: 1

 2.  Using lmer in Matrix package:

 fm - lmer(y ~ (1 | Sample) + (1 | Operator) +
(1|Operator:Run), data=x)

That model specification can now be written as

fm - lmer(y ~ (1|Sample) + (1|Operator/Run), x)

  summary(fm)

 Linear mixed-effects model fit by REML
 Formula: H.I.Index ~ (1 | Sample.Name) + (1 | Operator) + (1 |
 Operator:Run)
   Data: x
   AIC  BIClogLik MLdeviance REMLdeviance
  96.73522 109.0108 -44.36761   90.80064 88.73522
 Random effects:
  Groups   NameVariance   Std.Dev.
  Operator:Run (Intercept) 4.2718e-11 6.5359e-06
  Operator (Intercept) 5.4821e-04 2.3414e-02
  Sample   (Intercept) 5.0352e+00 2.2439e+00
  Residual 8.5436e-02 2.9229e-01
 number of obs: 159, groups: Operator:Run, 20; Operator, 4; Sample.Name,
 4

 Fixed effects:
  Estimate Std. Error  t value
 (Intercept) 0.0020818  1.1222683 0.001855

 There is a difference between lmer and lme is for the factor
 Operator:Run.

It's just a matter of round-off.  It is possible for the ML or REML
estimates of a variance component to be zero, as is the case here, but
the current computational methods do not allow the value zero because
this will cause some of the matrix decompositions to fail.  In lmer we
use a constrained optimization with the relative variance (variance of
a random effect divided by the residual variance) constrained to be
greater than or equal to 5e-10, which is exactly the value you have
here.

I'll add code to the model fitting routine to warn the user when
convergence to the boundary value occurs.  I haven't done that in the
past because it is not always easy to explain what is occurring.
For a model with variance components only, like yours, convergence on
the boundary means that an estimated variance component is zero.  In
the case of bivariate or multivariate random effects the
variance-covariance matrix can be singular without either of the
variances being zero.

The bottom line for you is that the estimated variance for
Operator:Run is zero and you should reduce the model to y ~ (1|Sample)
+ (1|Operator)


I cannot find where the problem is. Could anyone point me
 out if my model specification is correct for the problem I am dealing
 with. I am pretty new user to lme and lmer. Thanks for your help!





 Wilson Wang






 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Colours in Lattice

2006-07-29 Thread Lorenzo Isella
Dear All,

I am practicing with the image and wireframe (the latter in the lattice 
package) plotting tools.
I am a bit puzzled by the colors I observe in some test plots I have 
been generating.
Consider:

rm(list=ls())
library(lattice)
x - seq(-2*pi, 2*pi, len = 100)
y - seq(-2*pi, 2*pi, len = 100)
g - expand.grid(x = x, y = y)
mylen-length(x)


g$z - exp(-(g$x^2 + g$y^2))+exp(-((g$x-2)^2 + (g$y-2)^2))

pdf(my-first-lattice-figure.pdf)
print(wireframe(z ~ x * y, g, drape = TRUE,shade=TRUE,scales = 
list(arrows = FALSE),pretty=FALSE, aspect = c(1,1), colorkey = TRUE
,zoom=0.8, zlab = list(my radial function,rot = 
90),distance=0.0,perspective=TRUE,screen = list(z = 150, x = -55,y= 
0),ylim=range(c(-2*pi,2*pi)),xlim=range(c(-2*pi,2*pi)),zlim=range(c(0,1
dev.off()

Now, I should have two peaks corresponding to values slightly above 1 
which should be  cyan, whereas I see a different color. Furthermore, the 
floor of the function should correspond to about zero,  so I would 
expect it to be light purple rather than dark red.
Am I making any mistake with the options in the plotting? I suppose it 
all has to do with the shade argument, but using shade=FALSE leads to a 
very ugly-looking plot.
Is it possible to overlap another function to the previous one? I am 
thinking about something like the lines command when I use plot.
Finally, when I create a plot using image, is there a way to have a 
legenda of the values corresponding to each color (like the colorkey 
argument in wireframe)?
Kind Regards

Lorenzo

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] placing rectangle behind plot

2006-07-29 Thread Gabor Grothendieck
I am trying to create a lattice plot and would like to later, i.e. after
the plot is drawn, add a grey rectangle behind a portion of it.
The following works except that the rectrangle is on top of and
obscures a portion of the chart.  I also tried adding col = transparent
to the gpar list but that did not help -- I am on windows and
perhaps the windows device does not support transparency?
At any rate, how can I place the rectangle behind the plotted
points without drawing the rectangle first?

library(lattice)
library(grid)
trellis.unfocus()
x - 1:10
xyplot(x ~ x | gl(2,1), layout = 1:2)
trellis.focus(panel, 1, 1)
grid.rect(w = .5, gp = gpar(fill = light grey))
trellis.unfocus()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] uniroot

2006-07-29 Thread nurza m
Hello,

I am struggling to find the root of a exponent 
function.
uniroot is complaining about a values at end points
not of opposite sign?

 
s- sapply(1:length(w),function(i)
+  {
+  
+ + 
+ 
+
uniroot(saeqn,lower=-5000,upper=0.01036597923,l=list(t=w[i],gp=gp))$root
+  })
Error in uniroot(saeqn, lower = -5000, upper =
0.01036597923, l = list(t = w[i],  : 
f() values at end points not of opposite sign
 


 
and here is my fonction saeqn.


 saeqn-function(s,l)
+  {
+ 
+ 
+ p- exp(-l$gp$lambda+s)*l$gp$c
+ 
+ 
+
k11-(l$gp$mu*(l$gp$lambda^2)*l$gp$c-s*l$gp$lambda*l$gp$c*l$gp$mu+l$gp$mu*l$gp$lambda)*p
+ 
+ k12 -
-l$gp$mu*l$gp$lambda-s^2+2*s*l$gp$lambda-(l$gp$lambda^2)
+ 
+ k13 -k11+k12
+ 
+
k14-(l$gp$lambda-s)*(-l$gp$mu*s-s*l$gp$lambda+s^2+l$gp$mu*l$gp$lambda*p)
+ 
+ k1- -k13/k14
+ 
+  k1-l$t
+  }

 

. There is something I must be missing since I never
had 
luck with uniroot!

Thanks,

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading multiple txt files into one data frame

2006-07-29 Thread Kartik Pappu
Hello All,

I have a device that spews out experimental data as a series of text
files each of which contains one column with several rows of numeric
data.  My problem is that for each trial it gives me one text file
(and I run between 30 to 50 trials at a time) and I would ideally like
to merge all these text files into one large data frame with each
column representing a single trial.  It is not a problem if NA
characters are added to make all the columna of eaqual length.  Right
now I am doing this by opening each file individually and cutting and
pasting the data into an excel file.  How can I do this in R assuming
all my text files are in one directory.

Is it also possible to customize the column headers.  For example if I
have 32 trials and 16 are experimental and 16 are control and I want
to name the columns Expt1, Expt2,... Expt16  and the control
columns Cntl1,...Cntl16.

Kartik

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading multiple txt files into one data frame

2006-07-29 Thread jim holtman
This will read in all the data files in a directory.  I am assuming that
your file names are the same as the column names you want.

# use file names as column headers
setwd(...where ever you want it)
result - list()
for (i in list.files()){
result[[i]] - scan(i, what=0)  # assume single column of numbers
}
max.length - max(unlist(lapply(result, length)))  # get maximum length
# pad with NAs
newData - lapply(result, function(x){length(x) - max.length; x})
yourDF - data.frame(newData)


On 7/29/06, Kartik Pappu [EMAIL PROTECTED] wrote:

 Hello All,

 I have a device that spews out experimental data as a series of text
 files each of which contains one column with several rows of numeric
 data.  My problem is that for each trial it gives me one text file
 (and I run between 30 to 50 trials at a time) and I would ideally like
 to merge all these text files into one large data frame with each
 column representing a single trial.  It is not a problem if NA
 characters are added to make all the columna of eaqual length.  Right
 now I am doing this by opening each file individually and cutting and
 pasting the data into an excel file.  How can I do this in R assuming
 all my text files are in one directory.

 Is it also possible to customize the column headers.  For example if I
 have 32 trials and 16 are experimental and 16 are control and I want
 to name the columns Expt1, Expt2,... Expt16  and the control
 columns Cntl1,...Cntl16.

 Kartik

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Log color scale

2006-07-29 Thread Kartik Pappu
Hi all,

In response to a previous post about plotting a numeric square matrix
as a colored matrix, I was referred to both image and the
color2D.matplot function in the plotrix package.  Both have worked for
me thanks!!

However I need to plot my data in a log transformed color scale.  Is
this possible?  I will be happy to explain further, but basically I
need to do this because there are large variations in the max and min
values of my raw data and I am trying to highlight the differences in
the values at the lower end of my raw data (while still displaying the
values at the high end of the spectrum for comparison) so if figured
the best way to represent this on a (RGB) color scale is to do it with
a log transform.  I do not want to use too many colors because that
make the figure too busy.

Any suggestions on how to achieve this.

Kartik

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] placing rectangle behind plot

2006-07-29 Thread Sebastian P. Luque
Hi Gabor,


On Sat, 29 Jul 2006 17:20:29 -0400,
Gabor Grothendieck [EMAIL PROTECTED] wrote:

 I am trying to create a lattice plot and would like to later, i.e. after
 the plot is drawn, add a grey rectangle behind a portion of it.  The
 following works except that the rectrangle is on top of and obscures a
 portion of the chart.  I also tried adding col = transparent to the
 gpar list but that did not help -- I am on windows and perhaps the
 windows device does not support transparency?  At any rate, how can I
 place the rectangle behind the plotted points without drawing the
 rectangle first?

If you only need to draw the rectangle behind the points, why not
'panel.polygon' before 'panel.xyplot'?


xyplot(x ~ x | gl(2, 1), layout=1:2,
   panel=function(x, y, ...) {
   panel.polygon(c(3, 3, 8, 8), c(0, 12, 12, 0), col=2)
   panel.xyplot(x, y, ...)
   })


Cheers,

-- 
Seb

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] placing rectangle behind plot

2006-07-29 Thread Gabor Grothendieck
The reason I explicitly specified in the problem that the rectangle should
not be drawn first is that the xyplot is issued as part of a
larger routine that I don't want to modify.

On 7/29/06, Sebastian P. Luque [EMAIL PROTECTED] wrote:
 Hi Gabor,


 On Sat, 29 Jul 2006 17:20:29 -0400,
 Gabor Grothendieck [EMAIL PROTECTED] wrote:

  I am trying to create a lattice plot and would like to later, i.e. after
  the plot is drawn, add a grey rectangle behind a portion of it.  The
  following works except that the rectrangle is on top of and obscures a
  portion of the chart.  I also tried adding col = transparent to the
  gpar list but that did not help -- I am on windows and perhaps the
  windows device does not support transparency?  At any rate, how can I
  place the rectangle behind the plotted points without drawing the
  rectangle first?

 If you only need to draw the rectangle behind the points, why not
 'panel.polygon' before 'panel.xyplot'?


 xyplot(x ~ x | gl(2, 1), layout=1:2,
   panel=function(x, y, ...) {
   panel.polygon(c(3, 3, 8, 8), c(0, 12, 12, 0), col=2)
   panel.xyplot(x, y, ...)
   })


 Cheers,

 --
 Seb

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested repeated measures in R

2006-07-29 Thread Spencer Graves
  For nested, repeated measures of normally distributed data, the best 
capability available in R (and perhaps anywhere) is in the 'nlme' 
package.  Excellent documentation is available in Pinheiro and Bates 
(2000) Mixed-Effects Models in S and S-Plus (Springer).  This book is 
also a superlative research monograph by one of the leading contributors 
in this area (Bates) and one of his former graduate students (Pinheiro). 
  If your data are not normal, you may want to use 'lmer' in the 'lme4' 
and 'Matrix' packages;  if you need that, please search the archives for 
more information on that.

  For help getting data into R, if you have R installed, help.start() 
provides easy access to standard documentation shipped with R and 
available without Internet access.  See especially R Data 
Import/Export and An Introduction to R.

  The simplest way I've found to get data into R is to first store it 
something like *.csv or *.txt format, plain text with a unique sep 
character that is only used to separate fields.  Then I store the file 
name (with the path if different from 'getwd()') in a variable like 
'File'.  Note that \ is an escape character in R, and to get a 
character interpreted as \, you must use either \ or /.

  Then I use readLines to check the format, including identifying the 
sep character.  Then I use count.fields to determine the number of 
fields in each record and the number of records.  If I don't get the 
right answer there, I may not have the correct sep character.  Or I 
need to read the file in pieces, because of problems in the file.

  Then I use read.data.  It has many arguments that can be used to 
modify the sep character, skip lines at the beginning of the file and 
then read only a certain number of lines, per the output from 
count.fields.

  Hope this helps.
  Spencer Graves

Doran, Harold wrote:
 You can read the manual, read the FAQs, search the help functions and the 
 archives.
 
 
 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Eric C Merten
 Sent: Fri 7/21/2006 8:51 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] nested repeated measures in R
  
 R help,
 
 How would I input data, verify assumptions, and run a nested repeated
 measures ANOVA using R?  I have STATION nested in SITE nested in BLOCK with
 measurements repeated for five YEARs.  All are random variables and it's only
 slightly unbalanced.  I'm trying to characterize spatiotemporal variation in
 stream habitat variables.  Thanks for your help!
 
 Eric Merten
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about data used to fit the mixed model

2006-07-29 Thread Nantachai Kantanantha
Hi everyone,

I would like to ask a question regarding to the data used to fit the mixed 
model.

I wonder that, for the response variable data used to fit the mixed model 
(either via spm or lme), we must have several observations per subject 
(i.e. Yij,  i = 1,..,M,  j = 1,.., ni) or it can be just one observation per 
subject (i.e. Yi,  i = 1,...,M). Since we have to specify the groups for 
random effect components, if we have only one observation per subject, then 
each group will have only one observation.

Thank you vert much for your help.
Sincerely yours,

Nantachai

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.