Re: [R] convert a tar.gz to a Windows zip

2007-10-02 Thread Katharine Mullen
Uwe Ligges seems to be providing a simple way to make the conversion
(though I have not used it):
http://win-builder.r-project.org/

On Mon, 1 Oct 2007, Edna Bell wrote:

 Dear R Gurus;

 Is there a simple way to convert a Linux produced tar.gz file (a
 package) to a Windows binary zip package, please?

 Thanks in advance

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function for special matrix design

2007-10-02 Thread Wayne.W.Jones
Lookup: 

?diag
?upper.tri
?lower.tri

Example: 

my.mat-matrix(1:16,ncol=4)
my.mat
my.mat[upper.tri(my.mat)]-3
my.mat


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of kevinchang
Sent: 01 October 2007 17:37
To: r-help@r-project.org
Subject: [R] function for special matrix design



Hi All,

I am trying to make a matrix with a particular value for all the elements
above and including diagonal and another particular value for all the
elements below diagonal. Obviously , Matrix function in Matrix package does
not work since it only allow filling by either row or column. So I am
wondering if there is a built-in function allowing me to do this ? please
help. Thanks



-- 
View this message in context: 
http://www.nabble.com/function-for-special-matrix-design-tf4549437.html#a12982709
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Disentagling formulas

2007-10-02 Thread Eik Vettorazzi
I tried

formula-y~X1+X2+X3
update.formula(formula,.-NewRandomEffects~.)

and this runs without an error. So I still didn't get the point.

You may also need to insulate your new response variable if you want to 
estimate a model with in fact only one response variable, which ist done 
by I() as in

update.formula(formula,I(.-NewRandomEffects)~.)

hrth.

Rebecca Sela schrieb:
 Update looks like the function I want to use, but it still isn't working, 
 possibly because I am doing something more complicated than my message 
 implied.

 What I want to do is subtract a vector of random effects (determined within 
 the function) from Y.  I hoped that something like this:
   tree - rpart(update(formula,.-NewRandomEffects~.))
 would work, but I got the message
 Error in update.default(formula, . - NewRandomEffects ~ .) : 
 need an object with call component

 I have considered creating a function that would subtract the random effect 
 from each Y individually, but there is no identifier on Y.  

 Is there an obvious fix I'm missing?

 Thanks for your suggestions!

 Rebecca



 - Original Message -
 From: Eik Vettorazzi [EMAIL PROTECTED]
 To: Rebecca Sela [EMAIL PROTECTED]
 Cc: r-help@r-project.org
 Sent: Monday, October 1, 2007 11:34:13 AM (GMT-0500) America/New_York
 Subject: Re: [R] Disentagling formulas

 maybe I didn't understand the problem, but you can do sth like

 formula-y~X1+X2+X3
 lm(update.formula(formula,f(Y)~.))

 maybe
 ?all.vars
 or
 ?terms
 will help you too.

 hth.


 Rebecca Sela schrieb:
   
 I am writing a program in which I would like to take in a formula, change 
 the response (Y) variable into something else, and then pass the formula, 
 with the new Y variable to another function.  That is, I am starting with 
  formula - Y~X1+X2+X3
 and I'd like to do something like
  Y - formula$Y
  newY - f(Y)
  lm(newY~X1+X2+X3)
 So far, it seems that my only option will be a very complicated sequence of 
 steps involving match.call().  Is there a simpler way to change the response 
 variable in a formula?

 Thanks in advance!

 Rebecca

 --
 Rebecca Sela
 Statistics Department
 Stern School of Business
 New York University

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
 


   


-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/42803-8243
F ++49/40/42803-7790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with ARES

2007-10-02 Thread chevolot

Dear mailing list:

I am a new user of R and I would like to use the new ARES package. I have
followed the procedure to import a genepopfile and everything seemed to
work. However when I ran aresCalc function, I got a message error and I
don't know what it means. The error message is object alri.tot not found.
Can anyone help me?

Thank you,

Malia

-- 
View this message in context: 
http://www.nabble.com/problem-with-ARES-tf4553922.html#a12995734
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: Calculating group means using self-written function

2007-10-02 Thread Petr PIKAL
Hi

[EMAIL PROTECTED] napsal dne 02.10.2007 10:44:20:

 Hi R-users,
 
 Suppose I have a following data set.
 
 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20*1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 var1 - rep(1:5,12)
 z - rep(1:6,10)
 f - gl(3,20, labels=paste(lev, 1:3, sep=))
 d - data.frame(var1=var1, z=z,y=y, f=f)
 
 Using following code I can calculate group means
 
 library(doBy)
 summaryBy(y ~ f + var1, data=d, FUN=mean)
 
 How do I have to modify the FUN argument if I want to calculate mean
 using unique values
 
 for instance
 
 fun - function(x, y) sum(x)/length(unique(y))
 summaryBy(y ~ f + var1, data=d, FUN=fun(y, z)
 
 Error in get(x, envir, mode, inherits) : variable currFUN of mode
 function was not found

Not sure how to do it in doBy but using aggregate

aggregate(d$y, list(d$var1,d$f), fun, y=z)

probably do what you want

Regards
Petr

 
 Best regards
 LN
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mean of subset of rows

2007-10-02 Thread darteta001
Thankyou all for your answers, I have decided using aggregate() but I 
will keep in mind tapply(). I was wondering if it is possible to tell 
aggregate to use two functions at the same time, i.e., mean() and sd
(), or is it better to call aggregate() two times, one for mean, and 
another for sd and then cbind both results.

Also, if the data.frame has now three columns, ID-size-Town, as 
follows:

data-data.frame(ID=c(rep(letters[1:4],2),rep(f,12)), size=runif
(20),Town=c(rep(LETTERS[1:2],4),rep(C,12)))

And I want to produce a table with the mean size for each ID keeping 
Town in the result table, I cannot get to add the Town for each ID

avgs -aggregate(data$size, by = list(data$ID), mean)
SDs -aggregate(data$size, by = list(data$ID), sd)

results = cbind(avgs,SDs[2],data$Town)
Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 5, 20


Thanks again!

David

 You were on the right track with the for loop, but often you can do  
 the same thing looplessly (I know, it's not really a word) in R:
 
 If your data is like this:
 
 data-data.frame(ID=rep(letters[1:4], 5), size=runif(20))
 
 then apply either
 
 tapply(data$size, data$ID, mean)
 
 or
 
 aggregate(data$size, list(data$ID), mean)
 
 For further reference, section 4.2 in An Introduction to R  
 describes using tapply in this way.
 
 Jeff.
 
 On Oct 1, 2007, at 11:57 AM, [EMAIL PROTECTED]  
 [EMAIL PROTECTED] wrote:
 
  Dear list,
  this must be an easy one:
 
  I have a data.frame of two columns, ID with four different 
levels (A
  to D) and numerical size, and each of the 4 different IDs is
  repeated a
  different number of times. I would like to get the mean size for 
each
  ID as another data.frame. I have tried the following:
 
  ID= as.character(unique(data[,1])) # I use unique() because data
  will be larger in future
  nIDs = length(ID)
  for(i in 1:nIDs){
  +  subdata = subset(data,V1==ID[i])
  +  average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2]))
  + }
 
  Unfortunately, my output only gets the last level of ID four times:
  average
   V1 V2   V3
  1  1  D 179.7778
  2  2  D 179.7778
  3  3  D 179.7778
  4  4  D 179.7778
 
  How can I get what I need? there might be an easier way to do it, 
but
  I guess my skills aren´t that good. Any suggestions are welcome
 
  Regards,
 
  David
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting- 
  guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] building packages on Windows

2007-10-02 Thread Duncan Murdoch
On 01/10/2007 11:45 PM, Edna Bell wrote:
 Hi again.
 
 I'm sure that this is really simple.
 
 I'm trying to build a package on a Windows Vista machine.  I use
 Rcmd build --binary test
 
 but I get the Please set TMPDIR to a valid temporary directory
 
 I tried TMPDIR=c:\temp
 but to no avail.

I suspect you don't have write permission on c:\temp.

Windows systems typically have TEMP set to a writeable directory, and 
recent builds of R 2.6.0 will use that directory; in older versions you 
should be able to manually set TMPDIR to the same directory.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sample fromt he real line

2007-10-02 Thread Duncan Murdoch
On 01/10/2007 3:50 AM, Daniel Polhamus wrote:
 Hello R Gurus,
  
 This is a simple enough question, but I am curious as to whether there's an 
 answer...  Can R generate a random variable uniformly distributed on -Inf to 
 Inf?  Philosophically this doesn't seem possible, and if not, as I imagine 
 so, is there some sort of generally accepted factor I should be multiplying 
 by a Unif(-1,1) rv to sample from the real line?

There's no distribution that's uniform on the real line, but there are a 
lot of ways to sample non-uniformly from it.

A common approximation (e.g. to simulate a non-informative prior in 
Bayesian work) is to use a normal distribution with a large variance.
AFAIK there isn't any agreement on what large should be, because it 
depends so much on context.

Duncan Murdoch

 Thanks,
 Dan Polhamus
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splom pairs and groups argument

2007-10-02 Thread deepayan . sarkar
On 10/2/07, GOUACHE David [EMAIL PROTECTED] wrote:
 Hello,

 I'm trying to pull off a certain graph using splom, and can't quite get my
 panel functions right.
 Basically, the equivalent using pairs would be something like this (using
 iris data set as an example):

 panel.corval - function(x, y, digits=2, prefix=, cex.cor,col,pch)
  {
  usr - par(usr); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r - abs(cor(x, y,use=complete.obs))
  txt - format(c(r, 0.123456789), digits=digits)[1]
  txt - paste(prefix, txt, sep=)
  if(missing(cex.cor)) cex - 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex )
  }
 pairs(iris[,1:4], lower.panel=panel.smooth,
 upper.panel=panel.corval,col=rainbow(nlevels(iris$Species))[iris$Species],pch=19)

Here's a more or less direct translation:


panel.corval2 - function(x, y, digits=2, prefix=, cex.cor, ...)
{
require(grid)
r - abs(cor(x, y, use = complete.obs))
txt - format(c(r, 0.123456789), digits=digits)[1]
txt - paste(prefix, txt, sep=)
if (missing(cex.cor)) cex.cor - 10 / nchar(txt)
grid.text(txt, 0.5, 0.5, gp = gpar(cex = cex.cor))
}


splom(iris[1:4], groups = iris$Species, pch = 16,
  lower.panel = function(...) {
  panel.xyplot(...)
  panel.loess(..., col = 1, lwd = 3)
  },
  upper.panel = panel.corval2)


 My goals are:
 1) to have in one panel (above or below the diagonal) the points themselves
 with a smoothing line over all the points and in the other the (absolute)
 value of the correlation coefficient, so as to get an overall idea of my
 correlations
 2) to be able to identify points depending on levels of one or more factors
 (hence the groups argument that is giving me a hard time in writing a
 succesfull panel function)
 3) I want to do this in splom as I want to build several scatterplot
 matrices subsetting my data over the levels of another factor...

 Could anyone please help me out with this ?

 Also, while trying to set up an example, I landed upon this behavior I
 couldn't quite make sense of:

 panel.corval - function(x, y, digits=2, prefix=, cex.cor)
  {
  usr - par(usr); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r - abs(cor(x, y,use=complete.obs))
  txt - format(c(r, 0.123456789), digits=digits)[1]
  txt - paste(prefix, txt, sep=)
  if(missing(cex.cor)) cex - 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex )
  }
 pairs(iris[,1:4], lower.panel=panel.smooth,
 upper.panel=panel.corval,col=rainbow(nlevels(iris$Species))[iris$Species],pch=19)

 Erreur dans lower.panel(...) : unused argument(s) (col = c(#FF,
 #FF, #FF,

 I can't understand why not specifying col and pch arguments in the function
 I use for upper.panel (which dosen't at all need them in this case) is a
 problem.

upper.panel will be called with pch, col, etc., even if they don't
``need'' them. Having a ... argument should be enough to capture
these.

 It's all the more confusing that the error message says the problem
 is with lower.panel... If anyone has a clue as to what's happening, please
 tell.

pairs.default is renaming things internally; hint: try

pairs(iris[,1:4], upper.panel=panel.corval,pch=19, row1attop=FALSE)

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating group means using self-written function

2007-10-02 Thread Lauri Nikkinen
Thanks Petr,

Yes, your code seems to work. But when I try to reproduce it with my
original data set

fun - function(x, y) sum(x)/length(unique(y))
aggregate(vsid$lev, list(vsid$month, vsid$yeari), fun, vsid$lev=vsid$date)

I get

Error: syntax error, unexpected EQ_ASSIGN, expecting ',' in
aggregate(vsid$lev, list(vsid$month, vsid$year), fun, vsid$lev=

Can you intepret what is wrong? vsid$date is

$ date   :Class 'Date'  num [1:637] 13695 13695 13695 13695 13695 ...

Cheers,
Lauri

2007/10/2, Petr PIKAL [EMAIL PROTECTED]:
 Hi

 [EMAIL PROTECTED] napsal dne 02.10.2007 10:44:20:

  Hi R-users,
 
  Suppose I have a following data set.
 
  y1 - rnorm(20) + 6.8
  y2 - rnorm(20) + (1:20*1.7 + 1)
  y3 - rnorm(20) + (1:20*6.7 + 3.7)
  y - c(y1,y2,y3)
  var1 - rep(1:5,12)
  z - rep(1:6,10)
  f - gl(3,20, labels=paste(lev, 1:3, sep=))
  d - data.frame(var1=var1, z=z,y=y, f=f)
 
  Using following code I can calculate group means
 
  library(doBy)
  summaryBy(y ~ f + var1, data=d, FUN=mean)
 
  How do I have to modify the FUN argument if I want to calculate mean
  using unique values
 
  for instance
 
  fun - function(x, y) sum(x)/length(unique(y))
  summaryBy(y ~ f + var1, data=d, FUN=fun(y, z)
 
  Error in get(x, envir, mode, inherits) : variable currFUN of mode
  function was not found

 Not sure how to do it in doBy but using aggregate

 aggregate(d$y, list(d$var1,d$f), fun, y=z)

 probably do what you want

 Regards
 Petr

 
  Best regards
  LN
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Regression

2007-10-02 Thread Dimitris Rizopoulos
look at ?summary.lm(), and specifically at the `Value' section, e.g., 
try this:

lmFit - lm(weight ~ group - 1)
summ.lmFit - summary(lmFit)
summ.lmFit$coefficients
summ.lmFit$r.squared


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: livia [EMAIL PROTECTED]
To: r-help@r-project.org
Sent: Tuesday, October 02, 2007 1:04 PM
Subject: [R] Linear Regression



 Hello,

 I would like to fit a linear regression and when I use summary(), I 
 got the
 following result:

 Call:
 lm(formula = weight ~ group - 1)

 Residuals:
Min  1Q  Median  3Q Max
 -1.0710 -0.4938  0.0685  0.2462  1.3690

 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 groupCtl   5.0320 0.2202   22.85 9.55e-15 ***
 groupTrt   4.6610 0.2202   21.16 3.62e-14 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Residual standard error: 0.6964 on 18 degrees of freedom
 Multiple R-Squared: 0.9818, Adjusted R-squared: 0.9798
 F-statistic: 485.1 on 2 and 18 DF,  p-value:  2.2e-16

 In fact, I do not need them all. Is there a way of exacting part of 
 the
 infomation, like the Coefficient or Multiple R-Squared?
 -- 
 View this message in context: 
 http://www.nabble.com/Linear-Regression-tf4554258.html#a12996725
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Regression

2007-10-02 Thread John Sorkin
Livia,
Try the following:

fit1-summary(lm(weight~group-1)
summary(fit1)
names(fit1)

#The code above wil fit the model and print the results. 
#The statement names(fit1) will give you the components of
#the summary such as
#coefficients, R.squared  adj.R.squared, etc.
#You can then access the components, viz.

coeffs-summary(fit1)$coefficients

#or 

RSq-summary(fit1)$r.squared


John
 



John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 livia [EMAIL PROTECTED] 10/2/2007 7:04 AM 

Hello,

I would like to fit a linear regression and when I use summary(), I got the
following result:

Call:
lm(formula = weight ~ group - 1)

Residuals:
Min  1Q  Median  3Q Max 
-1.0710 -0.4938  0.0685  0.2462  1.3690 

Coefficients:
 Estimate Std. Error t value Pr(|t|)
groupCtl   5.0320 0.2202   22.85 9.55e-15 ***
groupTrt   4.6610 0.2202   21.16 3.62e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-Squared: 0.9818, Adjusted R-squared: 0.9798 
F-statistic: 485.1 on 2 and 18 DF,  p-value:  2.2e-16 

In fact, I do not need them all. Is there a way of exacting part of the
infomation, like the Coefficient or Multiple R-Squared?
-- 
View this message in context: 
http://www.nabble.com/Linear-Regression-tf4554258.html#a12996725 
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for the so...{{dropped}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Regression

2007-10-02 Thread John Sorkin
OOPPSS,
I forgot a right parenthesis

fit1-summary(lm(weight~group-1)   )




John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 John Sorkin [EMAIL PROTECTED] 10/2/2007 7:40 AM 
Livia,
Try the following:

fit1-summary(lm(weight~group-1)
summary(fit1)
names(fit1)

#The code above wil fit the model and print the results. 
#The statement names(fit1) will give you the components of
#the summary such as
#coefficients, R.squared  adj.R.squared, etc.
#You can then access the components, viz.

coeffs-summary(fit1)$coefficients

#or 

RSq-summary(fit1)$r.squared


John
 



John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 livia [EMAIL PROTECTED] 10/2/2007 7:04 AM 

Hello,

I would like to fit a linear regression and when I use summary(), I got the
following result:

Call:
lm(formula = weight ~ group - 1)

Residuals:
Min  1Q  Median  3Q Max 
-1.0710 -0.4938  0.0685  0.2462  1.3690 

Coefficients:
 Estimate Std. Error t value Pr(|t|)
groupCtl   5.0320 0.2202   22.85 9.55e-15 ***
groupTrt   4.6610 0.2202   21.16 3.62e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-Squared: 0.9818, Adjusted R-squared: 0.9798 
F-statistic: 485.1 on 2 and 18 DF,  p-value:  2.2e-16 

In fact, I do not need them all. Is there a way of exacting part of the
infomation, like the Coefficient or Multiple R-Squared?
-- 
View this message in context: 
http://www.nabble.com/Linear-Regression-tf4554258.html#a12996725 
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for the so...{{dropped}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for the so...{{dropped}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apparently Conflicting Results with coxph

2007-10-02 Thread Kevin E. Thorpe
Kevin E. Thorpe wrote:
 Peter Dalgaard wrote:
 Kevin E. Thorpe wrote:
 Dear List:

 I have a data frame prepared in the couting process style for including
 a binary time-dependent covariate.  The first few rows look like this.

 PtNo StartEnd Status Imp
 1  1 0  608.0  0   0
 2  2 0  513.0  0   0
 3  2   513  887.0  0   1
 4  3 0   57.0  0   0
 5  357  604.0  0   1
 6  4 0  150.0  1   0


 The outcome is mortality and the covariate is for an implantable
 defibrillator, so it is expected that the implant would reduce the
 risk of death.  The results of fitting coxph (survival package) are:

 Call:
 coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl)


  coef exp(coef) se(coef) zp
 Imp 0.163  1.180.485 0.337 0.74

 Likelihood ratio test=0.11  on 1 df, p=0.738  n= 335

 Since this was unexpected, I created a non-counting process data
 frame with an indicator variable representing received an implant
 or not.  Here are the results:

 Call:
 coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0)


  coef exp(coef) se(coef) z   p
 Implant -1.77 0.1710.426 -4.15 3.3e-05

 Likelihood ratio test=19.1  on 1 df, p=1.21e-05  n= 197

 I found this degree of discrepancy surprising, especially the point
 estimate of the coefficient.  I have verified the data frames are
 set up correctly.

 Here is what I have tried to understand what is going on.

 I tried fitting models adjusted for other covariates that I have in
 the data frame.  This did not appreciably affect the coefficients
 for the implant variable.

 I ran cox.zph on the two models shown above and plotted the results.
 In both cases, the point estimate of Beta(t) is sort of parabolic
 in that the curves are monotonically increasing to a local maximum
 after which they are monotonically decreasing (the CIs are a bit
 more wiggly).

 I would interpret this to mean that the effect of implant is probably
 time-dependent.  If so, how do I actually get a proper estimate of
 beta(t) for a variable like this?

 Are there some other things I should look at to understand what's
 going on?

   
 If you want to play with time-dependent regression coefficients have a
 look at the timereg package and the book that it supports.

 However, first you need to consider the possibility of selection effects
 that can take place even with non-varying effects. In the case at hand I
 would suspect a bias created by the fact that you don't implant devices
 into people who are already dead.

 
 Thanks.  The point in your last paragraph did cross my mind too.
 

I thought about this some more, and I'm not sure that possibility is
to blame.  In my time-dependent model, I don't think I'm doing
anything different than is done for transplant in the Stanford
Heart Study (the often used example for this kind of time-dependent
covariate).  As in my case, you would not transplant a dead patient.
So, I remain puzzled as to why my model is misbehaving.


-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: [EMAIL PROTECTED]  Tel: 416.864.5776  Fax: 416.864.6057

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating group means using self-written function

2007-10-02 Thread Lauri Nikkinen
Well, I finally found a roundabout

fun - function(x, y) sum(x)/max(y)
aggregate(vsid$lev, list(vsid$month, vsid$year), fun, y=by(vsid$date,
vsid$month, function(x) length(unique(x

Thanks,
Lauri

2007/10/2, Lauri Nikkinen [EMAIL PROTECTED]:
 Thanks Petr for your kind answer. I got it now but it seems that
 argument y will not be split by list(vsid$month, vsid$year) in the
 aggregate function. I should get number of days in each month in the
 denominator with length(unique(y)) but instead I get sum of days in
 months in the denominator. So I will not get correct answers. Should I
 modify my fun in some way?

 Best regards,
 Lauri

 2007/10/2, Petr PIKAL [EMAIL PROTECTED]:
  Hi
 
  [EMAIL PROTECTED] napsal dne 02.10.2007 13:19:09:
 
   Thanks Petr,
  
   Yes, your code seems to work. But when I try to reproduce it with my
   original data set
  
   fun - function(x, y) sum(x)/length(unique(y))
   aggregate(vsid$lev, list(vsid$month, vsid$yeari), fun,
  vsid$lev=vsid$date)
 
  Shall be
 
  aggregate(vsid$lev, list(vsid$month, vsid$yeari), fun, y=vsid$date)
 
  From help page
 
  ## S3 method for class 'data.frame':
  aggregate(x, by, FUN, ...)
 
  ...
  further arguments passed to or used by methods.
 
  Your function has 2 arguments one is x which is assigned vsid$lev and the
  other is y which you want to assign vsid$date. You can imagine that
  aggregate splits your x according to the levels mentioned in by and
  applies to each split a function fun together with any other argument,
  in your case y. So you need to provide a correct name to your function
  otherwise it does not know what to do.
 
  Regards
  Petr
 
  
   I get
  
   Error: syntax error, unexpected EQ_ASSIGN, expecting ',' in
   aggregate(vsid$lev, list(vsid$month, vsid$year), fun, vsid$lev=
  
   Can you intepret what is wrong? vsid$date is
  
   $ date   :Class 'Date'  num [1:637] 13695 13695 13695 13695 13695
  ...
  
   Cheers,
   Lauri
  
   2007/10/2, Petr PIKAL [EMAIL PROTECTED]:
Hi
   
[EMAIL PROTECTED] napsal dne 02.10.2007 10:44:20:
   
 Hi R-users,

 Suppose I have a following data set.

 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20*1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 var1 - rep(1:5,12)
 z - rep(1:6,10)
 f - gl(3,20, labels=paste(lev, 1:3, sep=))
 d - data.frame(var1=var1, z=z,y=y, f=f)

 Using following code I can calculate group means

 library(doBy)
 summaryBy(y ~ f + var1, data=d, FUN=mean)

 How do I have to modify the FUN argument if I want to calculate mean
 using unique values

 for instance

 fun - function(x, y) sum(x)/length(unique(y))
 summaryBy(y ~ f + var1, data=d, FUN=fun(y, z)

 Error in get(x, envir, mode, inherits) : variable currFUN of mode
 function was not found
   
Not sure how to do it in doBy but using aggregate
   
aggregate(d$y, list(d$var1,d$f), fun, y=z)
   
probably do what you want
   
Regards
Petr
   

 Best regards
 LN

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
   
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating group means using self-written function

2007-10-02 Thread Vladimir Eremeev


Lauri Nikkinen wrote:
 
 Suppose I have a following data set.
 
 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20*1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 var1 - rep(1:5,12)
 z - rep(1:6,10)
 f - gl(3,20, labels=paste(lev, 1:3, sep=))
 d - data.frame(var1=var1, z=z,y=y, f=f)
 
 Using following code I can calculate group means
 
 library(doBy)
 summaryBy(y ~ f + var1, data=d, FUN=mean)
 
 How do I have to modify the FUN argument if I want to calculate mean
 using unique values
 
 for instance
 
 fun - function(x, y) sum(x)/length(unique(y))
 summaryBy(y ~ f + var1, data=d, FUN=fun(y, z)
 
 Error in get(x, envir, mode, inherits) : variable currFUN of mode
 function was not found
 
 Best regards
 LN
 

Your first call to summaryBy suggests, that the argument FUN must be a name
of a function.
fun(y,z) in the second call is evaluated before call to summaryBy.

fun(y,z) returns a single number, so you have 
summaryBy(y ~ f + var1, data=d, FUN=331.9017)

summaryBy wants to call FUN(x) and use its result and check if it is valid. 
It cannot call 331.9017 and gives the error.

Probably, you might want to change your code for something like following

fun - function(x, y=1) sum(x)/length(unique(y))
summaryBy(y ~ f + var1, data=d, FUN=fun, y=z)

More details should be given in ?summaryBy
-- 
View this message in context: 
http://www.nabble.com/Calculating-group-means-using-self-written-function-tf4553726.html#a12998681
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using sprintf with dates

2007-10-02 Thread stephen bond
hello,

Please help with using sprintf with character variables:
The following does not produce what i intended

foot=function(){
  str1=format(Sys.Date,%Y%m%d)
  sprintf(99%-4s%s,nm,str1)
}

I wanted to have 99nm  20071002 as the output.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using sprintf with dates

2007-10-02 Thread Jeffrey Horner
stephen bond wrote on 10/02/2007 08:36 AM:
 hello,
 
 Please help with using sprintf with character variables:
 The following does not produce what i intended
 
 foot=function(){
   str1=format(Sys.Date,%Y%m%d)
   sprintf(99%-4s%s,nm,str1)
 }
 
 I wanted to have 99nm  20071002 as the output.

You forgot the parens after Sys.Date:

foot=function(){
   str1=format(Sys.Date(),%Y%m%d)
   sprintf(99%-4s%s,nm,str1)
}

Jeff
-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using sprintf with dates

2007-10-02 Thread Alberto Monteiro
stephen bond wrote:
 
 foot=function(){
   str1=format(Sys.Date,%Y%m%d)
   sprintf(99%-4s%s,nm,str1)
 }
 
 I wanted to have 99nm  20071002 as the output.
 
Sys.Date is a function. It's perfectly possible
to write string - format(Sys.Date, %s) (or, generically,
string - format(sin, %s), etc), but it will just put some
description of the function in the string.

Also, the assignment in R is -, not =.

Try:

foot - function(){
  str1 - format(Sys.Date(),%Y%m%d)
  sprintf(99%-4s%s,nm,str1)
}

Alberto Monteiro

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SVM: stratified cross-validation

2007-10-02 Thread Kaustubh Patil
Hi,

Is there a way to do stratified cross-validation of Support Vector Machines? 

Regards,
Kaustubh




   


the tools to get online.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cannot Install rimage

2007-10-02 Thread Rick Bilonick
I'm trying to install rimage in R version 2.5.1 running on Fedora 6
(kernel 2.6.22.7-57.fc6 with the headers and gcc installed, along with
fftw2 and libjpeg and headers):

 install.packages(rimage)
Warning in install.packages(rimage) : argument 'lib' is missing: using
'/usr/lib/R/library'
trying URL
'http://lib.stat.cmu.edu/R/CRAN/src/contrib/rimage_0.5-7.tar.gz'
Content type 'application/x-gzip' length 331029 bytes
opened URL
==
downloaded 323Kb

* Installing *source* package 'rimage' ...
checking for g++... no
checking for c++... no
checking for gpp... no
checking for aCC... no
checking for CC... no
checking for cxx... no
checking for cc++... no
checking for cl... no
checking for FCC... no
checking for KCC... no
checking for RCC... no
checking for xlC_r... no
checking for xlC... no
checking for C++ compiler default output... configure: error: C++
compiler cannot create executables
See `config.log' for more details.
ERROR: configuration failed for package 'rimage'
** Removing '/usr/lib/R/library/rimage'

The downloaded packages are in
/tmp/RtmpGDx12u/downloaded_packages
Warning message:
installation of package 'rimage' had non-zero exit status in:
install.packages(rimage) 

I cannot find config.log that is mentioned. Can someone please tell me
what is missing from my configuration? rimage installed on another
system running Fedora 7 once I installed fftw2.

Rick B.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mcv package gamm function Error in chol(XVX + S)

2007-10-02 Thread Dae-Jin Lee
Hi all R users !

I'm using gamm function from Simon Wood's mgcv package, to fit a spatial
regression Generalized Additive Mixed Model, as covariates I have the
geographical longitude and latitude locations of indexed data. I include a
random effect for each district (dist) so the code is

fit - gamm(y~s(lon,lat,bs=tp, m=2)+offset(log(exp.)),
random=list(dist=~1), family=poisson, niterPQL=30)

when I run the gamm function I obtain the next error message:

%%%
 Maximum number of PQL iterations:  30
iteration 1
iteration 2
...
iteration 8
iteration 9
iteration 10
Error in chol(XVX + S) : the leading minor of order 29 is not positive
definite
%%%

Could be any problem in gamm() ???

--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to view the code of a method?

2007-10-02 Thread Strong
Dear All 

I am a biginner of R.

I have difficulty with reading the code of a method.   

I am using the vars package to estimate a VAR model and I want to view the code 
of predict method for objects with class attribute varest. 

I thougt I could just type the name predict without anything to display the 
code of the method as I often do with generic function. However, I got the 
following messages:

function (object, ...) 
UseMethod(predict)
environment: namespace:stats

I cannot figure out the meaning that the above message want to deliver to me 
and thus cannot find a way to view the code of the predict method with class 
varest. 

Can anyone help me out?

Thanks in advance.  


Strong Chen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to view the code of a method?

2007-10-02 Thread Stefano Calza
For S3 methods I'd try something like

methods(predict)

 [1] predict.ar*predict.Arima* predict.arima0*  
  predict.glm   
 [5] predict.HoltWinters*   predict.lm predict.loess*   
  predict.mlm   
 [9] predict.nls*   predict.poly   predict.ppr* 
  predict.prcomp*   
[13] predict.princomp*  predict.smooth.spline* 
predict.smooth.spline.fit* predict.StructTS* 

   Non-visible functions are asterisked


and then e.g.

predict.lm

or

getAnywhere(predict.ar)

for those Non-visible


HIH

Stef



On Tue, Oct 02, 2007 at 11:52:34PM +0900, Strong wrote:
StrongDear All 
Strong
StrongI am a biginner of R.
Strong
StrongI have difficulty with reading the code of a method.   
Strong
StrongI am using the vars package to estimate a VAR model and I want to view 
the code of predict method for objects with class attribute varest. 
Strong
StrongI thougt I could just type the name predict without anything to 
display the code of the method as I often do with generic function. However, I 
got the following messages:
Strong
Strongfunction (object, ...) 
StrongUseMethod(predict)
Strongenvironment: namespace:stats
Strong
StrongI cannot figure out the meaning that the above message want to deliver 
to me and thus cannot find a way to view the code of the predict method with 
class varest. 
Strong
StrongCan anyone help me out?
Strong
StrongThanks in advance.  
Strong
Strong
StrongStrong Chen
Strong
Strong__
StrongR-help@r-project.org mailing list
Stronghttps://stat.ethz.ch/mailman/listinfo/r-help
StrongPLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
Strongand provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] permutations of a binary matrix with fixed margins

2007-10-02 Thread Paul Johnson
Jérôme,

As a first attempt, how about the function below. It works (or not) by
randomly sorting the rows and columns, then searching the table for
squares with the corners = matrix(c(1,0,0,1),ncol=2) and subtracting them
from 1 to give matrix(c(0,1,1,0),ncol=2) (and vice versa). Randomized
matrices can be produced as a chain where each permutation is seeded with
the previous one.

Potential problems:
1. Does it really explore all possible permutations? Does it do it in an
unbiased way?
2. Related to above: there is potential autocorrelation (although I haven't
found it with my data set), so might need some dememorisation steps.
3. It's slow and dememorisation makes it slower.
4. It isn't clear to me whether it needs the added stochastic element, i.e.
squares are only flipped if runif(1)0.5. In practice it seems to work
without it (as far as I can tell, i.e. it isn't autocorrelated using my data
set).

I suspect there's a much better way of doing this, which might take the
margins as an input, and therefore wouldn't be directly derived from any
particular matrix structure.

Paul

###

# function to permute binary matrix keeping margins fixed.
# the input x is the binary matrix to be permuted

  permute.struct-
function(x)
{
  x-x[rownames(x)[sample(1:nrow(x))],colnames(x)[sample(1:ncol(x))]]
  pattern-c(1,0,0,1)
  for(i in 1:(nrow(x)-1))
for(j in 1:(ncol(x)-1))
  for(ii in (i+1):nrow(x))
for(jj in (j+1):ncol(x))
{
  query-c(x[c(i,ii),c(j,jj)])
  if((all(query-pattern==0) || all(query==1-pattern)) 
runif(1)0.5)
x[c(i,ii),c(j,jj)] - 1 - x[c(i,ii),c(j,jj)]
}
  x
}

###


From: Mathieu Jérôme jerome.mathieu_at_snv.jussieu.fr 
Date: Thu 05 Apr 2007 - 13:34:01 GMT


Dear R Users, 
How can I randomize a matrix (with contains only 0 and 1) with the
constraint that margin totals (columns and rows sums) are kept constant?
Some have called this swap permutation or something like that. 

The principle is to find bloc of 
10 
01 
and change it for 
01 
10 
there can be several rows or columns between these numbers. It probably
exists in R, but I can't find the function. I am aware of permcont, but it
works only for 2x2 matrices 

thanks in advance 
Jerome 

-- 
Jérôme Mathieu
Laboratory of soil tropical ecology
UPMC
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to add legend for image()?

2007-10-02 Thread zhijie zhang
Dear friends,
   The following is an example to explain my question. I want to get a
legend which will show the z-values according to different colors in image()
function.

x-sort(runif(10))  #x-coordinates
y-sort(runif(10))  #y-coordinates
z-matrix(runif(100),nrow=10)  #attributes values
image(x,y,z,col=gray((6:3)/6))
# legend(x,y,legend=z,col=gray((6:3)/6)) #error. the colors should denote
different z-values, i want to get this legend.

 Thanks.

-- 
With Kind Regards,

oooO:
(..):
:\.(:::Oooo::
::\_)::(..)::
:::)./:::
::(_/
:
[***]
Zhi Jie,Zhang ,PHD
Tel:86-21-54237149
Dept. of Epidemiology,School of Public Health,Fudan University
Address:No. 138 Yi Xue Yuan Road,Shanghai,China
Postcode:200032
Email:[EMAIL PROTECTED]
Website: www.statABC.com
[***]
oooO:
(..):
:\.(:::Oooo::
::\_)::(..)::
:::)./:::
::(_/
:

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lpSolve doesn't compile because of a malloc.h error

2007-10-02 Thread vittorio
Yes, it was very clear but being an absolute beginner as a programmer I was 
misled by the presence of both stdlib.h and malloc.h in lpslink55.c (in a 
nutshell, I thought Why s it complaining if stdlib.h is there, in the 
code??).

Anyway, thanks to your help I made it in the end.

Ciao
Vittorio

Monday 01 October 2007 20:47:36 Prof Brian Ripley wrote:
 Isn't the message rather clear?  You need to edit the package sources 
 replace malloc.h by stdlib.h (or remove the former if the latter is
 already there, as in this case).

 And BTW, 'gcc-4.2.2' is unreleased, but imminent (as R 2.6.0 is).

 On Mon, 1 Oct 2007, vittorio wrote:
  Under freebsd 6.2-p7 i386, R 2.5.1,gcc-4.2.2 I'm unable to compile
  package lpSolve because:
 
  hpbsd# R CMD INSTALL lpSolve_5.5.8.tar.gz
  * Installing to library '/usr/local/lib/R/library'
  * Installing *source* package 'lpSolve' ...
  ** libs
  cc -std=gnu99 -I/usr/local/lib/R/include -I/usr/local/lib/R/include -I .
  -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE
  -DUSRDLL -DCLOCKTIME -DRoleIsExternalInvEngine
  -DINVERSE_ACTIVE=INVERSE_LUSOL -DINLINE=static -DParanoia
  -I/usr/local/include  -D__NO_MATH_INLINES  -fpic  -O2
  -fno-strict-aliasing -pipe -march=prescott -c colamd.c -o colamd.o
  cc -std=gnu99 -I/usr/local/lib/R/include -I/usr/local/lib/R/include -I .
  -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE
  -DUSRDLL -DCLOCKTIME -DRoleIsExternalInvEngine
  -DINVERSE_ACTIVE=INVERSE_LUSOL -DINLINE=static -DParanoia
  -I/usr/local/include  -D__NO_MATH_INLINES  -fpic  -O2
  -fno-strict-aliasing -pipe -march=prescott -c commonlib.c -o commonlib.o
  
  cc -std=gnu99 -I/usr/local/lib/R/include -I/usr/local/lib/R/include -I .
  -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE
  -DUSRDLL -DCLOCKTIME -DRoleIsExternalInvEngine
  -DINVERSE_ACTIVE=INVERSE_LUSOL -DINLINE=static -DParanoia
  -I/usr/local/include  -D__NO_MATH_INLINES  -fpic  -O2
  -fno-strict-aliasing -pipe -march=prescott -c lp_wlp.c -o lp_wlp.o
  cc -std=gnu99 -I/usr/local/lib/R/include -I/usr/local/lib/R/include -I .
  -DINTEGERTIME -DPARSER_LP -DBUILDING_FOR_R -DYY_NEVER_INTERACTIVE
  -DUSRDLL -DCLOCKTIME -DRoleIsExternalInvEngine
  -DINVERSE_ACTIVE=INVERSE_LUSOL -DINLINE=static -DParanoia
  -I/usr/local/include  -D__NO_MATH_INLINES  -fpic  -O2
  -fno-strict-aliasing -pipe -march=prescott -c lpslink55.c -o lpslink55.o
  In file included from lpslink55.c:11:
  /usr/include/malloc.h:3:2: #error malloc.h has been replaced by
  stdlib.h *** Error code 1
 
  Stop in /tmp/R.INSTALL.zVvtvK/lpSolve/src.
  ERROR: compilation failed for package 'lpSolve'
  ** Removing '/usr/local/lib/R/library/lpSolve'
  ===
 
  Now, both malloc.h and stdlib.h are under /usr/include.
 
  What's wrong with it?
 
  Ciao
  Vittorio
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating proportions from a data frame rather than a table

2007-10-02 Thread Farrel Buchinsky
When one has raw data it is easy to create a table of one variable against
another and then calculate proportions
For example
a.nice.table-table(a,b)
prop.table(a.nice.table,1)

However, I looked at several papers and created a data frame of the
aggregate data. That means I acually created a table except it is a data
frame. The first column lists the name of the first author and the year.
I cannot find how to convert the data frame to a table so I can use great
functions such as prop.table and margin.table.

Alas I tried, rowSums but it provides lousy output without listing the names
of th papers.
I have thought of going through the reshape package but I suspect that there
is an easier way to convert a data frame to a table. Is there?

-- 
Farrel Buchinsky

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mcv package gamm function Error in chol(XVX + S)

2007-10-02 Thread Simon Wood
Well, it's certainly a problem detected by gamm, but whether it's a 
fundamental statistical problem with this model/data combination or something 
fixable by taking an alternative approach within gamm I can't tell. Would you 
be able to send me the data used here (I promise not to use it for anything 
else, of course), so I can figure out where the problem is?

best,
Simon 

 I'm using gamm function from Simon Wood's mgcv package, to fit a spatial
 regression Generalized Additive Mixed Model, as covariates I have the
 geographical longitude and latitude locations of indexed data. I include a
 random effect for each district (dist) so the code is

 fit - gamm(y~s(lon,lat,bs=tp, m=2)+offset(log(exp.)),
 random=list(dist=~1), family=poisson, niterPQL=30)

 when I run the gamm function I obtain the next error message:

 %%%
  Maximum number of PQL iterations:  30
 iteration 1
 iteration 2
 ...
 iteration 8
 iteration 9
 iteration 10
 Error in chol(XVX + S) : the leading minor of order 29 is not positive
 definite
 %%%

 Could be any problem in gamm() ???

 --

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
 Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
 +44 1225 386603  www.maths.bath.ac.uk/~sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] zoo timeserie continuous? complete with NaN

2007-10-02 Thread marcg
dear r-list

I have a zoo object with 2 objects and time:

looks like:
2005-12-31 12:00:00   NA  NaN
2005-12-31 13:00:00   NA  NaN
2005-12-31 14:00:00   NA  NaN
2005-12-31 15:00:00   NA  NaN
2005-12-31 16:00:00   NA  NaN
2005-12-31 18:00:00   NA  NaN
2005-12-31 19:00:00   NA  NaN
2005-12-31 20:00:00   NA  NaN
2005-12-31 21:00:00   NA  NaN
2005-12-31 22:00:00   NA  NaN
2005-12-31 23:00:00   NA  NaN

its much longer though...

As you can see, the values for 2005-12-31 17:00:00 miss. How can I let R show 
me if the timeseries is continuous and if not, to fill up with NaN values (i 
need a continuous timeseries). is there a function like 
complete.timeseries(missing=NA)?

Thanks fpr your help

Marc
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plot question

2007-10-02 Thread Tiandao Li
Hello,

I have a question about how to plot a series of data. The folloqing is my 
data matrix of n
 n
 25p5p  2.5p 0.5p
16B-E06.g 45379  4383  5123   45
16B-E06.g 45138  4028  6249   52
16B-E06.g 48457  4267  5470   54
16B-E06.g 47740  4676  6769   48
37B-B02.g 42860  6152 19276   72
35B-A02.g 48325 12863 38274  143
35B-A02.g 48410 12806 39013  175
35B-A02.g 48417  9057 40923  176
35B-A02.g 51403 13865 43338  161
45B-C12.g 50939  3656  5783   43
45B-C12.g 52356  5524  6041   55
45B-C12.g 49338  5141  5266   41
45B-C12.g 51567  3915  5677   43
35A-G04.g 40365  5513  6971   32
35B-D01.g 54217 12607 13067   93
35B-D01.g 55283 11441 14964  101
35B-D01.g 55041  9626 14928   94
35B-D01.g 54058  9465 14912   88
35B-A04.g 42745 12080 34271  105
35B-A04.g 41055 12423 34874  126

colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is 
Intensity. I want to plot the data this way.
x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
y-axis is Intensity
Inside of plot is the points of intensity over 4 concentrations, points 
from different genes have different color or shape. A regression line of 
each genes crosss different concetrations, and at the end of line is gene 
IDs.

Thanks,

Tiandao

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitted values in LMER for the fixed-effects only

2007-10-02 Thread Douglas Bates
On 9/29/07, Dieter Menne [EMAIL PROTECTED] wrote:
 On 9/28/07, Anouk Simard Anouk.Simard at bio.ulaval.ca wrote:

   I would like to extract the fitted values from a model using LMER but
   only for the fix portion of the model and not for the fix and random
   portion (e.g it is the procedure outpm or outp in SAS).
 ..
 Quoting Douglas Bates bates at stat.wisc.edu

   The calculation of the level = 0 fitted values in the new
   representation of the fitted model is quite easy. It is
  
   [EMAIL PROTECTED] %*% fixef(fm1)
  

 Douglas Bates replied

  Yes.

 Mmm.. If I consider a few hundreds of messages in this list on the subject,
 it's considered dirty to use the [EMAIL PROTECTED] or fm1$X construct. So 
 should we
 expect that this is final (no longer fitted(), augPred etc), or is it work
 in prooogres?

This is to indicate that I am rather slow in getting the lme4 package
finished?  :-)

I agree that I am and no one is more impatient with the progress than I.

You are correct that it is considered bad form to reach into a fitted
model structure and pull out individual components or slots.  My brief
answer agreeing that this was still the best way of obtaining these
particular values should have emphasized the still.  I haven't
thought of a good way of specifying this particular form in, say, the
fitted method.  There is a clean way of specifying the levels of
random effects incorporated into the fitted values for models with
nested random effects only.  However, for crossed or partially crossed
random effects this is less clearly defined.

Perhaps a better alternative would be to define a method for the
model.matrix generic for lmer models.  Then the calculation could be
written as

model.matrix(fm1, type = fixed) %*% fixef(fm1)

using only extractor functions.

It may be surprising that sometimes the most difficult part of the
design of the code in a package is deciding on the names and arguments
of functions or methods.  In my experience rushing such decisions
often leads to a lot of maintenance headaches.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to add legend for image()?

2007-10-02 Thread Eric Thompson
Have you tried filled.contour()? It automatically generates a legend.
However, I'm not sure what is going on with your x's and y's being
sorted random numbers. I assume your actual data are not like this. If
you do not have data in an equally spaced grid you may need to create
one by some sort of interpolation (e.g. Kriging or inverse distance
weighting).



On 10/2/07, zhijie zhang [EMAIL PROTECTED] wrote:
 Dear friends,
The following is an example to explain my question. I want to get a
 legend which will show the z-values according to different colors in image()
 function.

 x-sort(runif(10))  #x-coordinates
 y-sort(runif(10))  #y-coordinates
 z-matrix(runif(100),nrow=10)  #attributes values
 image(x,y,z,col=gray((6:3)/6))
 # legend(x,y,legend=z,col=gray((6:3)/6)) #error. the colors should denote
 different z-values, i want to get this legend.

  Thanks.

 --
 With Kind Regards,

 oooO:
 (..):
 :\.(:::Oooo::
 ::\_)::(..)::
 :::)./:::
 ::(_/
 :
 [***]
 Zhi Jie,Zhang ,PHD
 Tel:86-21-54237149
 Dept. of Epidemiology,School of Public Health,Fudan University
 Address:No. 138 Yi Xue Yuan Road,Shanghai,China
 Postcode:200032
 Email:[EMAIL PROTECTED]
 Website: www.statABC.com
 [***]
 oooO:
 (..):
 :\.(:::Oooo::
 ::\_)::(..)::
 :::)./:::
 ::(_/
 :

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apparently Conflicting Results with coxph

2007-10-02 Thread Terry Therneau
  From my experience, what you are seeing is almost certainly a patient 
selection effect.  (The number 1 reason for puzzling results is incorrect 
coding 
of a time-dependent covariate, but you appear to have been quite careful).
  
   Assigning the implant as a non-time dependent covariate almost guarrantees 
that the estimated effect will be beneficial.  The only people who get an 
implant are those who live longer than average (long enough to get an implant). 
 
The size of such a bias is surprisingly large.  The problem is rediscovered in 
the cancer field every few years, in comparisons of responders to 
non-responders. 

   As a time-dependent covariate, you have the problem of indication for 
treatment.  Say for instance that the devices were very expensive, and were 
only 
used for patients in immenent danger of death.  For a device that was a placebo 
you would find, not surprisingly, that being selected for implantation carried 
a 
major risk.  The device may need to be extremely effective to overcome this 
type 
of bias.  As a simple example, if you compare the death rate of those who have 
seen a oncologist (cancer doc) in the last month to those who have not done so, 
you find that the former group has a much higher death rate.
   
   Terry Therneau
   
 Kevin E. Thorpe wrote:
 Dear List:

 I have a data frame prepared in the couting process style for including
 a binary time-dependent covariate.  The first few rows look like this.

 PtNo StartEnd Status Imp
 1  1 0  608.0  0   0
 2  2 0  513.0  0   0
 3  2   513  887.0  0   1
 4  3 0   57.0  0   0
 5  357  604.0  0   1
 6  4 0  150.0  1   0


 The outcome is mortality and the covariate is for an implantable
 defibrillator, so it is expected that the implant would reduce the
 risk of death.  The results of fitting coxph (survival package) are:

 Call:
 coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl)


  coef exp(coef) se(coef) zp
 Imp 0.163  1.180.485 0.337 0.74

 Likelihood ratio test=0.11  on 1 df, p=0.738  n= 335

 Since this was unexpected, I created a non-counting process data
 frame with an indicator variable representing received an implant
 or not.  Here are the results:

 Call:
 coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0)


  coef exp(coef) se(coef) z   p
 Implant -1.77 0.1710.426 -4.15 3.3e-05

 Likelihood ratio test=19.1  on 1 df, p=1.21e-05  n= 197

 I found this degree of discrepancy surprising, especially the point
 estimate of the coefficient.  I have verified the data frames are
 set up correctly.

 Here is what I have tried to understand what is going on.

 I tried fitting models adjusted for other covariates that I have in
 the data frame.  This did not appreciably affect the coefficients
 for the implant variable.

 I ran cox.zph on the two models shown above and plotted the results.
 In both cases, the point estimate of Beta(t) is sort of parabolic
 in that the curves are monotonically increasing to a local maximum
 after which they are monotonically decreasing (the CIs are a bit
 more wiggly).

 I would interpret this to mean that the effect of implant is probably
 time-dependent.  If so, how do I actually get a proper estimate of
 beta(t) for a variable like this?

 Are there some other things I should look at to understand what's
 going on?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apparently Conflicting Results with coxph

2007-10-02 Thread Charles C. Berry
On Tue, 2 Oct 2007, Kevin E. Thorpe wrote:

 Kevin E. Thorpe wrote:
 Peter Dalgaard wrote:
 Kevin E. Thorpe wrote:
 Dear List:

 I have a data frame prepared in the couting process style for including
 a binary time-dependent covariate.  The first few rows look like this.

 PtNo StartEnd Status Imp
 1  1 0  608.0  0   0
 2  2 0  513.0  0   0
 3  2   513  887.0  0   1
 4  3 0   57.0  0   0
 5  357  604.0  0   1
 6  4 0  150.0  1   0


 The outcome is mortality and the covariate is for an implantable
 defibrillator, so it is expected that the implant would reduce the
 risk of death.  The results of fitting coxph (survival package) are:

 Call:
 coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl)


  coef exp(coef) se(coef) zp
 Imp 0.163  1.180.485 0.337 0.74

 Likelihood ratio test=0.11  on 1 df, p=0.738  n= 335

 Since this was unexpected, I created a non-counting process data
 frame with an indicator variable representing received an implant
 or not.  Here are the results:

 Call:
 coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0)


  coef exp(coef) se(coef) z   p
 Implant -1.77 0.1710.426 -4.15 3.3e-05

 Likelihood ratio test=19.1  on 1 df, p=1.21e-05  n= 197

 I found this degree of discrepancy surprising, especially the point
 estimate of the coefficient.  I have verified the data frames are
 set up correctly.

 Here is what I have tried to understand what is going on.

 I tried fitting models adjusted for other covariates that I have in
 the data frame.  This did not appreciably affect the coefficients
 for the implant variable.

 I ran cox.zph on the two models shown above and plotted the results.
 In both cases, the point estimate of Beta(t) is sort of parabolic
 in that the curves are monotonically increasing to a local maximum
 after which they are monotonically decreasing (the CIs are a bit
 more wiggly).

 I would interpret this to mean that the effect of implant is probably
 time-dependent.  If so, how do I actually get a proper estimate of
 beta(t) for a variable like this?

 Are there some other things I should look at to understand what's
 going on?


 If you want to play with time-dependent regression coefficients have a
 look at the timereg package and the book that it supports.

 However, first you need to consider the possibility of selection effects
 that can take place even with non-varying effects. In the case at hand I
 would suspect a bias created by the fact that you don't implant devices
 into people who are already dead.


 Thanks.  The point in your last paragraph did cross my mind too.


 I thought about this some more, and I'm not sure that possibility is
 to blame.  In my time-dependent model, I don't think I'm doing
 anything different than is done for transplant in the Stanford
 Heart Study (the often used example for this kind of time-dependent
 covariate).  As in my case, you would not transplant a dead patient.
 So, I remain puzzled as to why my model is misbehaving.


Indeed, the bias created that Peter alludes to is precisely because you 
would not implant a device in a dead patient that the implantation of such 
a device has an apparent negative effect on hazard of mortality.

If a patient is enrolled and a date is set (perhaps implicitly) for 
implanting the device, then a patient who survives longer is more likely 
to receive the device.

Here is a setup in which patients have exponential hazard and are 
scheduled to receive devices uniformly over a fixed time interval 
following their enrollment. As you see the time of intended implant is 
independent of the time of death by construction, but the apparent effect 
of implantation is negative.

 time.of.death - rexp(1000)
 time.of.implant - runif(1000,0,2)
 implant.occurred - time.of.death  time.of.implant
 coxph( Surv( time.of.death ) ~ implant.occurred )
Call: coxph(formula = Surv(time.of.death) ~ implant.occurred)


   coef exp(coef) se(coef) z p
implant.occurredTRUE -1.80 0.1650.084 -21.5 0

Likelihood ratio test=519  on 1 df, p=0  n= 1000

HTH,

Chuck


 -- 
 Kevin E. Thorpe
 Biostatistician/Trialist, Knowledge Translation Program
 Assistant Professor, Department of Public Health Sciences
 Faculty of Medicine, University of Toronto
 email: [EMAIL PROTECTED]  Tel: 416.864.5776  Fax: 416.864.6057

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901


Re: [R] Plot for two Series

2007-10-02 Thread Gabor Grothendieck
Here is an example:

ts.plot(ts(1:10, start = 2000), ts(2:8, start = 2005))


On 10/2/07, amna khan [EMAIL PROTECTED] wrote:
 Sir the data used in function ts.plot() is of the form


   Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov
 1974 2134 1863 1877 1877 1492 1249 1280 1131 1209 1492 1621
 1975 2103 2137 2153 1833 1403 1288 1186 1133 1053 1347 1545
 1976 2020 2750 2283 1479 1189 1160 1113  970  999 1208 1467
 1977 2240 1634 1722 1801 1246 1162 1087 1013  959 1179 1229
 1978 2019 2284 1942 1423 1340 1187 1098 1004  970 1140 1110
 1979 2263 1820 1846 1531 1215 1075 1056  975  940 1081 1294


 But I have the data series of the form

YEAR Khanpur
 1   195252.1
 2   195322.4
 3195426.4
 41955   132.8
 5   195671.9
 6   195710.4
 7   1958 9.1
 8   195949.8

 because of which the sacle on x-axis are the numbers 1,2 3,..etc.

 Please guid in this regard

 Thank you




 On 10/1/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  See ?ts.plot
  Also ?plot.zoo and ?xyplot.zoo in the zoo package.
 
  On 10/1/07, amna khan  [EMAIL PROTECTED] wrote:
   Dear Sir
  
   I want to plot the two time series having unequal number of
 observations,
   say, one series having values from year 1960 to year 2000 and other
 having
   values from year 1975 to year 2000. How to plot them on a single graph
   sheet.
  
   Regards
  
  
   --
   AMINA SHAHZADI
   Department of Statistics
   GC University Lahore, Pakistan.
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 



 --

 AMINA SHAHZADI
 Department of Statistics
 GC University Lahore, Pakistan.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot question

2007-10-02 Thread Eric Thompson
If I've correctly interpreted what you want, you first need to get the x values:

x - colnames(n)
x - as.numeric(substr(x, 1, nchar(x) - 1))

Then it seems fairly easy to use matplot to get the values with
different colors for each concentration

dim(x) - c(length(x), 1)
matplot(x, t(n), pch = 1)

But this does not look like a simple line will fit the data for each
gene well, so perhaps I've misunderstood something. You will have to
decide how you want to do the regression. It will also get very messy
and difficult to read with 20 lines (a different regression for each
gene). To do the regressions, plot the lines, and label with the gene
ID, see

?lm
?predict
?abline
?text




On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Hello,

 I have a question about how to plot a series of data. The folloqing is my
 data matrix of n
  n
  25p5p  2.5p 0.5p
 16B-E06.g 45379  4383  5123   45
 16B-E06.g 45138  4028  6249   52
 16B-E06.g 48457  4267  5470   54
 16B-E06.g 47740  4676  6769   48
 37B-B02.g 42860  6152 19276   72
 35B-A02.g 48325 12863 38274  143
 35B-A02.g 48410 12806 39013  175
 35B-A02.g 48417  9057 40923  176
 35B-A02.g 51403 13865 43338  161
 45B-C12.g 50939  3656  5783   43
 45B-C12.g 52356  5524  6041   55
 45B-C12.g 49338  5141  5266   41
 45B-C12.g 51567  3915  5677   43
 35A-G04.g 40365  5513  6971   32
 35B-D01.g 54217 12607 13067   93
 35B-D01.g 55283 11441 14964  101
 35B-D01.g 55041  9626 14928   94
 35B-D01.g 54058  9465 14912   88
 35B-A04.g 42745 12080 34271  105
 35B-A04.g 41055 12423 34874  126

 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
 Intensity. I want to plot the data this way.
 x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
 y-axis is Intensity
 Inside of plot is the points of intensity over 4 concentrations, points
 from different genes have different color or shape. A regression line of
 each genes crosss different concetrations, and at the end of line is gene
 IDs.

 Thanks,

 Tiandao

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to create this graph?

2007-10-02 Thread Greg Snow
You could create this graph using barplot (give a matrix with the
input), then use the text and possibly lines or segments functions to
add the text and any additional lines needed.

However, this graph is not clear to me, the box with 41 in it looks
bigger than the box with 59, are those supposed to be the values of the
boxes? Or somethning else.

If you can give a better description of what you are trying to
accomplish, then maybe we can give better advice (including possibly a
better graph to create).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of squall44
 Sent: Monday, October 01, 2007 1:42 AM
 To: r-help@r-project.org
 Subject: [R] how to create this graph?
 
 
 Hello,
 
 I'm trying to create the following graph:
 
 http://www.nabble.com/file/p12974686/bar.gif 
 
 Is there a command (like pie(...) or barplot(...)) or is this 
 just a special case of eg barplot?
 
 I'd be glad if someone could give me a hint on where to start.
 
 Regards,
 Tobias
 --
 View this message in context: 
 http://www.nabble.com/how-to-create-this-graph--tf4546742.html
 #a12974686
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] continuous boxplot?

2007-10-02 Thread Greg Snow
Here are a couple of things that might help:

Look at the hexbin package (I think it is on bioconductor rather than cran)
Look at the quantreg package (for estimating the quantiles to plot)

Look at the running and wapply functions in the gtools package for another 
option to estimate the quantiles.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Karin Lagesen
 Sent: Monday, October 01, 2007 2:32 AM
 To: r-help@r-project.org
 Subject: [R] continuous boxplot?
 
 
 
 I have two vectors x and y, which I would like to plot 
 against each other. I am also displaying other data in this 
 plot. However, I have about 1 million points to plot, and 
 just plotting them x againt y is not very informative. What 
 I'd like to do is to do sort of a continuous box plot. 
 
 My x values goes from -1 to 1 and my y values from 0 to 1, so 
 I´d like to plot the median and quantiles, and possibly also 
 all of the outliers somehow. Are there any facilities in R 
 for doing something like this, or would I need to do this the 
 hard coded way?
 
 Thankyou very much for your help!
 
 Karin
 --
 Karin Lagesen, PhD student
 [EMAIL PROTECTED]
 http://folk.uio.no/karinlag
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] phyper returns negative results

2007-10-02 Thread Prof Brian Ripley
On Tue, 2 Oct 2007, Bastian Angermann wrote:

 Dear R-users,

 In R 2.5.1 on Windows XP, SP2 the call to 
 phyper(0,0,74,3,lower.tail=FALSE) returns -4.195862e-17. This does not 
 happen with R2.5.1 on Linux, 0 is returned. Is this a bug (and should be 
 reported as such), since phyper should not return negative values, or is 
 this considered as one of the annoyances of finite precision arithmetic. 
 I tend to think of this as a bug since it breaks the assumption 
 phyper()=0.

You are specifically asked to try out the latest version before calling 
'bug' (in the R FAQ, in the R posting guide ...).

R version 2.6.0 RC (2007-10-01 r43049)
...
 phyper(0,0,74,3,lower.tail=FALSE)
[1] 0

on Windows.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] continuous boxplot?

2007-10-02 Thread Jim Porzak
Karin,

I like to use bagplots in these cases where there are a lot of cases and
scatter plots become one big smudge.

See
http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/bagplot/bagplot.pdf

And some further examples on slides 36 - 39 of
http://www.porzak.com/JimArchive/JimPorzak_CIwithR_useR2006_tutorial.pdf

-- 
HTH,
Jim Porzak
Responsys, Inc.
San Francisco, CA
http://www.linkedin.com/in/jimporzak

On 10/1/07, Karin Lagesen [EMAIL PROTECTED] wrote:



 I have two vectors x and y, which I would like to plot against each
 other. I am also displaying other data in this plot. However, I have
 about 1 million points to plot, and just plotting them x againt y is
 not very informative. What I'd like to do is to do sort of a
 continuous box plot.

 My x values goes from -1 to 1 and my y values from 0 to 1, so I´d like
 to plot the median and quantiles, and possibly also all of the
 outliers somehow. Are there any facilities in R for doing something
 like this, or would I need to do this the hard coded way?

 Thankyou very much for your help!

 Karin
 --
 Karin Lagesen, PhD student
 [EMAIL PROTECTED]
 http://folk.uio.no/karinlag

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Variable selection in R

2007-10-02 Thread Yves Moisan

Disclaimer : Short of having local statistical expertise at hand, I'm using
this list because I use R for variable selection in the context of linear
multiple regression but the questions I have relate more to basic statistics
than to R per se.  Please redirect me to another appropriate list if such a
list exists.


I have the very common problem of identifying which (subset of) variables
are important in a multiple linear regression problem.  Googling and
browsing around, I ended up with four methods I could easily access through
R packages.  My aim was to use those methods with continuous variables in a
bid to find out which had an effect on my dependent variable, but more so I
thought this would lead me to identify variables amenable to categorization. 
Example : I have population as a continuous variable so I figured if some
variable selection algorithm flagged population as significant, then I
could try and apply a priori knowledge of the classes of population ranges I
suspect could show different behaviours in the form of statistically
different means as per ANOVA analysis.  Coming back to my original variable
selection need, here's what I did.


First : regular lm
Second : step
Third : all-subsets (regsubsets, package leaps)
Fourth : lasso (l1ce, package lasso2)
Fifth : (I meant to use the lars package, but it does not allow for
formulas; I know I could cast my dataset as matrices, but I didn't find an
easy way of doing this and I figured I had enough options already)


I'm trying to make sense of the information that is sent back at me from the
summary calls.  I'm looking for what variables are identified by each
method, hoping to find comparable/complementary results.  Given I'm a
transient user of statistics, I find many of the Details sections on the
help files lack specific instructions as to exactly how to interpret the
results.  Here we go


*lm*

 
anova(tonlm)
Analysis of Variance Table

Response: Cout.ton
 Df  Sum Sq Mean Sq F valuePr(F)
Tonnage   197209720  1.3497 0.2470437
Popul 1  112361  112361 15.6014 0.0001164 ***
DensiteOcc1  173350  173350 24.0699 2.245e-06 ***
NbmRues.hab   1 280 280  0.0389 0.8438903
RFU.hab   1 183 183  0.0254 0.8734816
UO.MAMR.Precis1   67161   67161  9.3254 0.0026428 ** 
t.CS.t.déchets.MAMR   1   24188   24188  3.3586 0.0686925 .  
Pct.Pot.CS.hab1   78764   78764 10.9365 0.0011614 ** 
NbCentresTriDs100km   1  218725  218725 30.3702 1.380e-07 ***
DistMarche1   54114   54114  7.5137 0.0068110 ** 
Residuals   162 11667177202


*ANOVA applied on step(lm) *

Response: Cout.ton
 Df  Sum Sq Mean Sq F valuePr(F)
Popul 129172917  0.4038  0.525990
DensiteOcc1  163741  163741 22.6730 4.179e-06 ***
RFU.hab   1  83  83  0.0115  0.914899
UO.MAMR.Precis1  168407  168407 23.3191 3.112e-06 ***
Pct.Pot.CS.hab1  122685  122685 16.9880 5.943e-05 ***
NbCentresTriDs100km   1  190659  190659 26.4002 7.761e-07 ***
DistMarche1   65467   65467  9.0652  0.003015 ** 
Residuals   165 11916067222


Questions compared to lm above : What tells me which variable was selected
first in the stepwise process ?  Do I sort Pr(F), the lowest value of which
corresponds to the first variable ?  Popul has a 3 star rating in lm and
nothing in step.  How do I interpret that ?

* all-subsets regression *

 summary(tonall)$cp

 [1] 37.338138 31.452375 25.300272 17.965950 13.751043 10.021910  8.455827 
8.810498  9.273599 11.00
 summary(tonall)$adjr2

 [1] 0.2155976 0.2411378 0.2680048 0.2997661 0.3197652 0.3381030 0.3481410
0.3506880 0.3528339 0.3499369

 summary(tonall)$which[1,][summary(tonall)$which[1,]]

(Intercept)  DistMarche 
 summary(tonall)$which[2,][summary(tonall)$which[2,]]

(Intercept), NbCentresTriDs100km, DistMarche 
 summary(tonall)$which[3,][summary(tonall)$which[3,]]

(Intercept), Tonnage,  UO.MAMR.Precis, NbCentresTriDs100km  
 summary(tonall)$which[4,][summary(tonall)$which[4,]]

(Intercept), Tonnage, DensiteOcc, UO.MAMR.Precis, NbCentresTriDs100km  
 summary(tonall)$which[5,][summary(tonall)$which[5,]]

(Intercept), Tonnage, DensiteOcc, UO.MAMR.Precis NbCentresTriDs100km,
DistMarche  


omitting the remaining values up to which[10,]


Questions w/r to lm and step : all-subsets says DistMarche is the single
most important.  That makes some sense as that variable had a two-star
rating in both lm and step.  But shouldn't the 3-star ratings in step be
close to those in all-subsets ?  For example,  the best 3-variable model
shows Tonnage popping up.  Tonnage has no rating in lm and doesn't even show
in step.  Is it a matter of step being initialized with a variable such that
Tonnage will never be considered whereas it is in an exhaustive all-subsets
regression ? I'm puzzled.

* lasso *

 

Re: [R] plot question

2007-10-02 Thread Tiandao Li
Thanks for your quick reply, Eric.

I want plot colnames(n) as string on x-axis. If the regression lines don't 
fit the data very well, it is OK, the plot is only for quality check.


On Tue, 2 Oct 2007, Eric Thompson wrote:

If I've correctly interpreted what you want, you first need to get the x values:

x - colnames(n)
x - as.numeric(substr(x, 1, nchar(x) - 1))

Then it seems fairly easy to use matplot to get the values with
different colors for each concentration

dim(x) - c(length(x), 1)
matplot(x, t(n), pch = 1)

But this does not look like a simple line will fit the data for each
gene well, so perhaps I've misunderstood something. You will have to
decide how you want to do the regression. It will also get very messy
and difficult to read with 20 lines (a different regression for each
gene). To do the regressions, plot the lines, and label with the gene
ID, see

?lm
?predict
?abline
?text




On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Hello,

 I have a question about how to plot a series of data. The folloqing is my
 data matrix of n
  n
  25p5p  2.5p 0.5p
 16B-E06.g 45379  4383  5123   45
 16B-E06.g 45138  4028  6249   52
 16B-E06.g 48457  4267  5470   54
 16B-E06.g 47740  4676  6769   48
 37B-B02.g 42860  6152 19276   72
 35B-A02.g 48325 12863 38274  143
 35B-A02.g 48410 12806 39013  175
 35B-A02.g 48417  9057 40923  176
 35B-A02.g 51403 13865 43338  161
 45B-C12.g 50939  3656  5783   43
 45B-C12.g 52356  5524  6041   55
 45B-C12.g 49338  5141  5266   41
 45B-C12.g 51567  3915  5677   43
 35A-G04.g 40365  5513  6971   32
 35B-D01.g 54217 12607 13067   93
 35B-D01.g 55283 11441 14964  101
 35B-D01.g 55041  9626 14928   94
 35B-D01.g 54058  9465 14912   88
 35B-A04.g 42745 12080 34271  105
 35B-A04.g 41055 12423 34874  126

 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
 Intensity. I want to plot the data this way.
 x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
 y-axis is Intensity
 Inside of plot is the points of intensity over 4 concentrations, points
 from different genes have different color or shape. A regression line of
 each genes crosss different concetrations, and at the end of line is gene
 IDs.

 Thanks,

 Tiandao

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot question

2007-10-02 Thread Eric Thompson
Okay. If you want to customize the axis labels, you can suppress the
defaults by changing the matplot call to

matplot(x, t(n), pch = 1, axes = FALSE)

And then adding them how you want:

axis(side = 2)
axis(side = 3, at = x, lab = colnames(n))
box()



On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Thanks for your quick reply, Eric.

 I want plot colnames(n) as string on x-axis. If the regression lines don't
 fit the data very well, it is OK, the plot is only for quality check.


 On Tue, 2 Oct 2007, Eric Thompson wrote:

 If I've correctly interpreted what you want, you first need to get the x 
 values:

 x - colnames(n)
 x - as.numeric(substr(x, 1, nchar(x) - 1))

 Then it seems fairly easy to use matplot to get the values with
 different colors for each concentration

 dim(x) - c(length(x), 1)
 matplot(x, t(n), pch = 1)

 But this does not look like a simple line will fit the data for each
 gene well, so perhaps I've misunderstood something. You will have to
 decide how you want to do the regression. It will also get very messy
 and difficult to read with 20 lines (a different regression for each
 gene). To do the regressions, plot the lines, and label with the gene
 ID, see

 ?lm
 ?predict
 ?abline
 ?text




 On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
  Hello,
 
  I have a question about how to plot a series of data. The folloqing is my
  data matrix of n
   n
   25p5p  2.5p 0.5p
  16B-E06.g 45379  4383  5123   45
  16B-E06.g 45138  4028  6249   52
  16B-E06.g 48457  4267  5470   54
  16B-E06.g 47740  4676  6769   48
  37B-B02.g 42860  6152 19276   72
  35B-A02.g 48325 12863 38274  143
  35B-A02.g 48410 12806 39013  175
  35B-A02.g 48417  9057 40923  176
  35B-A02.g 51403 13865 43338  161
  45B-C12.g 50939  3656  5783   43
  45B-C12.g 52356  5524  6041   55
  45B-C12.g 49338  5141  5266   41
  45B-C12.g 51567  3915  5677   43
  35A-G04.g 40365  5513  6971   32
  35B-D01.g 54217 12607 13067   93
  35B-D01.g 55283 11441 14964  101
  35B-D01.g 55041  9626 14928   94
  35B-D01.g 54058  9465 14912   88
  35B-A04.g 42745 12080 34271  105
  35B-A04.g 41055 12423 34874  126
 
  colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
  Intensity. I want to plot the data this way.
  x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
  y-axis is Intensity
  Inside of plot is the points of intensity over 4 concentrations, points
  from different genes have different color or shape. A regression line of
  each genes crosss different concetrations, and at the end of line is gene
  IDs.
 
  Thanks,
 
  Tiandao
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem loading a txt file as a data.frame

2007-10-02 Thread Ettinger, Nicholas
Hello all,

I know that this is a terribly banal question but I cannot seem to solve
it.

I am trying to load in data from a tab-delimited text file.  Some
columns are mixed text-numbers and other columns are strictly numbers.
Some cells are blank.

My command is:
MDMT_RPup  -  read.table(GCRMA_MDM-T_RPup.txt, header=T,
sep=\t, row.names=NULL, fill=TRUE)

The problem is that this text file should load with 118 rows, 7 columns
of data.  But when I execute the following command:

dim(MDMT_RPup)
I only get:
40  7

Other similarly generated text files of varying sizes (9 rows, 72 rows,
etc.) seem to load just fine.

I have tried opening the text file in MS Excel and deleting all columns
to the right of the data in case there was some kind of hidden
character.  No help.

Any ideas?
--Nick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] continuous boxplot?

2007-10-02 Thread Bert Gunter


Folks:

I found the references in the previous replies to this vexing data
visualization issue to be quite interesting and useful. I think it fair to
say that there is no single best way to do this -- it all depends on what
you need to learn , and probably several alternative displays will be
necessary to get the important information the data have to convey.
However,as always, this issue has been considered before, and it may be
worthwhile to at least consider an already available standard approach
using shingles and a trellis-type plot. ?xyplot and ?shingle should get you
started (you probably want to shingle or bin on quantiles of y). The
canonical reference is Bill Cleveland's VISUALIZING DATA (see coplots). 


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Jim Porzak
Sent: Tuesday, October 02, 2007 11:19 AM
To: Karin Lagesen
Cc: r-help@r-project.org
Subject: Re: [R] continuous boxplot?

Karin,

I like to use bagplots in these cases where there are a lot of cases and
scatter plots become one big smudge.

See
http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/bagplot/bagplot.pdf

And some further examples on slides 36 - 39 of
http://www.porzak.com/JimArchive/JimPorzak_CIwithR_useR2006_tutorial.pdf

-- 
HTH,
Jim Porzak
Responsys, Inc.
San Francisco, CA
http://www.linkedin.com/in/jimporzak

On 10/1/07, Karin Lagesen [EMAIL PROTECTED] wrote:



 I have two vectors x and y, which I would like to plot against each
 other. I am also displaying other data in this plot. However, I have
 about 1 million points to plot, and just plotting them x againt y is
 not very informative. What I'd like to do is to do sort of a
 continuous box plot.

 My x values goes from -1 to 1 and my y values from 0 to 1, so I4d like
 to plot the median and quantiles, and possibly also all of the
 outliers somehow. Are there any facilities in R for doing something
 like this, or would I need to do this the hard coded way?

 Thankyou very much for your help!

 Karin
 --
 Karin Lagesen, PhD student
 [EMAIL PROTECTED]
 http://folk.uio.no/karinlag

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange names when creating a data.frame: Difference between - and =

2007-10-02 Thread John Kane
This is just a curiosity question.  Why do the two
different syntaxes for df1 and df2 give such different
results in the names(dfx)?

Thanks


df1 - data.frame(nas = c(A,  B , B ,C ,B,
A, D),nums = c(3,2,1,   1,   2,3,  7))

df2 - data.frame(nas  - c(A,  B , B ,C ,B,
A, D),nums  - c(3,2,1,   1,   2,3, 
7))

df1; df2

df1
  nas nums
1   A3
2   B2
.   .

  df2

   nasc..ABBCBAD..
numsc.3..2..1..1..2..3..7.
1   A 
3
2   B 

3
.   .

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adehabitat package question - trying to generate animal home ranges

2007-10-02 Thread Peter H Singleton

Hello,

I'm new to R and am trying to use the adehabitat package for home range and
habitat selection analysis for some animal radiotelemetry data sets I have,
but I can't get the home range functions (mcp  kernelUD) to work with my
data. I think that my problem is that I don't properly understand how the
package uses data frames and factors, and I may not be defining the id
factor correctly.

I've started out by trying to imitate the code in the homerange.r demo that
comes with the ade package. The code I'm trying to repeat with my data is:
###
###
###
### loads the data

library(adehabitat)
data(puechabon)
xy-puechabon$locs[,c(X,Y)]
id-puechabon$locs$Name

## The data are:
xy[1:4,] ## relocations coordinates
id[1:4]  ## ID


###
###
###
### Home ranges

## MCP
hr-mcp(xy, id) ## home range estimation
plot(hr)## displays the MCP
(jj-mcp.area(xy, id))  ## home range size
plot(jj)## plots home-range size


## Kernel home range
(hr-kernelUD(xy, id, h=LSCV, grid=100)) ## UD estimation
plotLSCV(hr)   ## LSCV criterion
image(hr)  ## displays the UD
(jj-kernel.area(xy, id))  ## home range size
plot(jj)   ## Plots home range size
ver - getverticeshr(hr)   ## home-range contours
plot(ver)  ## Plots contours

==
The following is as far as I have gotten with my attempt to repeat the
above:

 require(adehabitat)
...
## I can get my data into the data frame object test
 test = read.csv(c:/rjunk/testdata.csv, header=T)
 test
   BIRD_LOC BIRD SEX  DATE TIME DATEVALUE SEASON  X   Y
1   CBF1001  CBF   F 5/18/2004 1756 38125   04BR 675022 5302087
2   CBF1002  CBF   F 5/19/2004 1817 38126   04BR 674988 5302111
...
90  CBM1043  CBM   M 9/28/2004  739 38258   04BR 675305 5302229
91  CBM1044  CBM   M 10/1/2004  738 38261   04BR 675005 5302109

## What does the $locs do? Am I correct that this is referring to something
that I don't have defined in my data frame?
 xy-test$locs[,c(X,Y)]
 id-test$locs$bird
 xy
NULL
 id
NULL

## I figured out how to subset the data into my XY coordinates and the BIRD
id lables, but the mcp and kernelUD functions don't work with these
subsets. Do I need to do something else to identify BIRD as a factor? Is
there a data prep step that I'm missing?
 xy - subset(test, select = c(X, Y))
 xy
X   Y
1  675022 5302087
2  674988 5302111
...
90 675305 5302229
91 675005 5302109
 id - subset(test, select = c(BIRD))
 id
   BIRD
1   CBF
2   CBF
...
90  CBM
91  CBM
 hr - mcp(xy, id)
Error in mcp(xy, id) : xy and id should be of the same length
 hr - kernelUD(xy, id, h=LSCV, grid=100)
Error in kernelUD(xy, id, h = LSCV, grid = 100) :
id should have the same length as xy


---
Peter Singleton
Wildlife Ecologist
USFS Pacific Northwest Research Station
1133 N. Western Ave.
Wenatchee WA 98801
Phone: (509)664-1732
Fax: (509)665-8362
E-mail: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot question

2007-10-02 Thread Tiandao Li
However, I got the following msg.

 matplot(colnames(n), t(n), pch = 1, axes = FALSE)
Error in plot.window(xlim, ylim, log, asp, ...) : 
need finite 'xlim' values
In addition: Warning messages:
1: NAs introduced by coercion in: as.double.default(x) 
2: no non-missing arguments to min; returning Inf in: min(x) 
3: no non-missing arguments to max; returning -Inf in: max(x) 
4: NAs introduced by coercion in: as.double.default(x) 



On Tue, 2 Oct 2007, Eric Thompson wrote:

Okay. If you want to customize the axis labels, you can suppress the
defaults by changing the matplot call to

matplot(x, t(n), pch = 1, axes = FALSE)

And then adding them how you want:

axis(side = 2)
axis(side = 3, at = x, lab = colnames(n))
box()



On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Thanks for your quick reply, Eric.

 I want plot colnames(n) as string on x-axis. If the regression lines don't
 fit the data very well, it is OK, the plot is only for quality check.


 On Tue, 2 Oct 2007, Eric Thompson wrote:

 If I've correctly interpreted what you want, you first need to get the x 
 values:

 x - colnames(n)
 x - as.numeric(substr(x, 1, nchar(x) - 1))

 Then it seems fairly easy to use matplot to get the values with
 different colors for each concentration

 dim(x) - c(length(x), 1)
 matplot(x, t(n), pch = 1)

 But this does not look like a simple line will fit the data for each
 gene well, so perhaps I've misunderstood something. You will have to
 decide how you want to do the regression. It will also get very messy
 and difficult to read with 20 lines (a different regression for each
 gene). To do the regressions, plot the lines, and label with the gene
 ID, see

 ?lm
 ?predict
 ?abline
 ?text




 On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
  Hello,
 
  I have a question about how to plot a series of data. The folloqing is my
  data matrix of n
   n
   25p5p  2.5p 0.5p
  16B-E06.g 45379  4383  5123   45
  16B-E06.g 45138  4028  6249   52
  16B-E06.g 48457  4267  5470   54
  16B-E06.g 47740  4676  6769   48
  37B-B02.g 42860  6152 19276   72
  35B-A02.g 48325 12863 38274  143
  35B-A02.g 48410 12806 39013  175
  35B-A02.g 48417  9057 40923  176
  35B-A02.g 51403 13865 43338  161
  45B-C12.g 50939  3656  5783   43
  45B-C12.g 52356  5524  6041   55
  45B-C12.g 49338  5141  5266   41
  45B-C12.g 51567  3915  5677   43
  35A-G04.g 40365  5513  6971   32
  35B-D01.g 54217 12607 13067   93
  35B-D01.g 55283 11441 14964  101
  35B-D01.g 55041  9626 14928   94
  35B-D01.g 54058  9465 14912   88
  35B-A04.g 42745 12080 34271  105
  35B-A04.g 41055 12423 34874  126
 
  colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
  Intensity. I want to plot the data this way.
  x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
  y-axis is Intensity
  Inside of plot is the points of intensity over 4 concentrations, points
  from different genes have different color or shape. A regression line of
  each genes crosss different concetrations, and at the end of line is gene
  IDs.
 
  Thanks,
 
  Tiandao
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem loading a txt file as a data.frame

2007-10-02 Thread Charles C. Berry
On Tue, 2 Oct 2007, Ettinger, Nicholas wrote:

 Hello all,

 I know that this is a terribly banal question but I cannot seem to solve
 it.

 I am trying to load in data from a tab-delimited text file.  Some
 columns are mixed text-numbers and other columns are strictly numbers.
 Some cells are blank.

 My command is:
 MDMT_RPup-  read.table(GCRMA_MDM-T_RPup.txt, header=T,
 sep=\t, row.names=NULL, fill=TRUE)

You have specified neither quote nor comment.char.

The default for quote is \'. You might try changing it to  or \.


 The problem is that this text file should load with 118 rows, 7 columns
 of data.  But when I execute the following command:

 dim(MDMT_RPup)
 I only get:
 40  7

Look at each of the rows. Where things start going crazy you need to 
inspect the data.

You might use readLines() and cat() to read in and inspect the data rather 
than a text editor.

HTH,

Chuck


 Other similarly generated text files of varying sizes (9 rows, 72 rows,
 etc.) seem to load just fine.

 I have tried opening the text file in MS Excel and deleting all columns
 to the right of the data in case there was some kind of hidden
 character.  No help.

 Any ideas?
 --Nick

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot question

2007-10-02 Thread Eric Thompson
But you did not use the command I suggested: you replaced x with
colnames(n), which is a vector of characters. Characters such as 25p
has little meaning for plotting:

 as.numeric(25p)
[1] NA

So you've tried to plot a bunch of NA's, which is why it doesn't know
what limits to use for xlim.

That is why I gave you the commands to convert that character array
colnames(n) to a meaningful numeric array. It is unreasonable to
expect R to guess that 25p = 25.

x - colnames(n)
x - as.numeric(substr(x, 1, nchar(x) - 1))
matplot(x, t(n), pch = 1, axes = FALSE)
axis(side = 2)
axis(side = 3, at = x, lab = colnames(n))
box()



On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
 However, I got the following msg.

  matplot(colnames(n), t(n), pch = 1, axes = FALSE)
 Error in plot.window(xlim, ylim, log, asp, ...) :
 need finite 'xlim' values
 In addition: Warning messages:
 1: NAs introduced by coercion in: as.double.default(x)
 2: no non-missing arguments to min; returning Inf in: min(x)
 3: no non-missing arguments to max; returning -Inf in: max(x)
 4: NAs introduced by coercion in: as.double.default(x)



 On Tue, 2 Oct 2007, Eric Thompson wrote:

 Okay. If you want to customize the axis labels, you can suppress the
 defaults by changing the matplot call to

 matplot(x, t(n), pch = 1, axes = FALSE)

 And then adding them how you want:

 axis(side = 2)
 axis(side = 3, at = x, lab = colnames(n))
 box()



 On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
  Thanks for your quick reply, Eric.
 
  I want plot colnames(n) as string on x-axis. If the regression lines don't
  fit the data very well, it is OK, the plot is only for quality check.
 
 
  On Tue, 2 Oct 2007, Eric Thompson wrote:
 
  If I've correctly interpreted what you want, you first need to get the x 
  values:
 
  x - colnames(n)
  x - as.numeric(substr(x, 1, nchar(x) - 1))
 
  Then it seems fairly easy to use matplot to get the values with
  different colors for each concentration
 
  dim(x) - c(length(x), 1)
  matplot(x, t(n), pch = 1)
 
  But this does not look like a simple line will fit the data for each
  gene well, so perhaps I've misunderstood something. You will have to
  decide how you want to do the regression. It will also get very messy
  and difficult to read with 20 lines (a different regression for each
  gene). To do the regressions, plot the lines, and label with the gene
  ID, see
 
  ?lm
  ?predict
  ?abline
  ?text
 
 
 
 
  On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
   Hello,
  
   I have a question about how to plot a series of data. The folloqing is my
   data matrix of n
n
25p5p  2.5p 0.5p
   16B-E06.g 45379  4383  5123   45
   16B-E06.g 45138  4028  6249   52
   16B-E06.g 48457  4267  5470   54
   16B-E06.g 47740  4676  6769   48
   37B-B02.g 42860  6152 19276   72
   35B-A02.g 48325 12863 38274  143
   35B-A02.g 48410 12806 39013  175
   35B-A02.g 48417  9057 40923  176
   35B-A02.g 51403 13865 43338  161
   45B-C12.g 50939  3656  5783   43
   45B-C12.g 52356  5524  6041   55
   45B-C12.g 49338  5141  5266   41
   45B-C12.g 51567  3915  5677   43
   35A-G04.g 40365  5513  6971   32
   35B-D01.g 54217 12607 13067   93
   35B-D01.g 55283 11441 14964  101
   35B-D01.g 55041  9626 14928   94
   35B-D01.g 54058  9465 14912   88
   35B-A04.g 42745 12080 34271  105
   35B-A04.g 41055 12423 34874  126
  
   colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
   Intensity. I want to plot the data this way.
   x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
   y-axis is Intensity
   Inside of plot is the points of intensity over 4 concentrations, points
   from different genes have different color or shape. A regression line of
   each genes crosss different concetrations, and at the end of line is gene
   IDs.
  
   Thanks,
  
   Tiandao
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique rows in data frame

2007-10-02 Thread jim holtman
Here is one way to get the means:

 x
  x1 x2
1  A  1
2  B  2
3  B  3
 aggregate(x$x2, list(x$x1),mean)
  Group.1   x
1   A 1.0
2   B 2.5



On 10/2/07, Dieter Best [EMAIL PROTECTED] wrote:
 Hello there,

  I have a data frame a small version of which could look like the following:

x1 x2
 1  A  1
 2  B  2
 3  B  3

  Now I need to remove rows which are duplicate in x1, i.e. in the example 
 above I would remove row 3.

  I have an ugly solution with for and while loops and ifs. ... And of course 
 my data set is much larger and my solution takes along time.

  Any ideas what could be the best way to do this in R?

  Better yet: I actually would like to sort of collapse row 2 and 3 in the 
 example above by replacing 2 and 3 with a new row 2 which has in x2 the mean 
 of old x2 of row 2 and 3 (maybe this is poorly said).

  Anyways, thanks a lot in advance for suggestions.

  -- D



 -

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange names when creating a data.frame: Difference between - and =

2007-10-02 Thread jim holtman
your two dataframe defintions are different.  In the case of df1, you
have 'names' the elements, and in the case of df2,you have done an
assignment to 'nam' and 'num' (look in the workspace and you will see
the object.  Since you haven't names the elements in df2, is tries to
constuct a name from the expressions that is given (e.g., nums = c(3,
  2,1,   1,   2,3,  7)), so you get.

This is what happens:

 make.names(nums  - c(3,2,1,   1,   2,3, 7))
[1] nums.c.3.2.112.3..7.

It tries to make a name from the expression.

On 10/2/07, John Kane [EMAIL PROTECTED] wrote:
 This is just a curiosity question.  Why do the two
 different syntaxes for df1 and df2 give such different
 results in the names(dfx)?

 Thanks


 df1 - data.frame(nas = c(A,  B , B ,C ,B,
 A, D),nums = c(3,2,1,   1,   2,3,  7))

 df2 - data.frame(nas  - c(A,  B , B ,C ,B,
 A, D),nums  - c(3,2,1,   1,   2,3,
 7))

 df1; df2

 df1
  nas nums
 1   A3
 2   B2
.   .

   df2

   nasc..ABBCBAD..
 numsc.3..2..1..1..2..3..7.
 1   A
3
 2   B

 3
 .   .

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] continuous boxplot?

2007-10-02 Thread hadley wickham
A couple of other nice references for dealing with many points in a
scatterplot are:

D. B. Carr, R. J. Littlefield, W. L. Nicholson, and J. S. Littlefield.
Scatterplot matrix techniques for large n. Journal of the American
Statistical Association, 82(398):424–436, 1987.

W. S. Cleveland and R. McGill. The many faces of a scatterplot.
Journal of the American Statistical Association, 79(388):807–822,
1984.

A. Unwin, M. Theus, and H. Hofmann. Graphics of Large Datasets. Springer, 2006.

Another technique is to overlay contours of a 2d kernel density
estimate - this is somewhat similar to a bag-plot, although with
different underlying assumptions.

It's also important to think about whether you're interested in the
joint (e.g. bag plot) or conditional (e.g. quantile regression)
density.

Hadley


On 10/2/07, Bert Gunter [EMAIL PROTECTED] wrote:


 Folks:

 I found the references in the previous replies to this vexing data
 visualization issue to be quite interesting and useful. I think it fair to
 say that there is no single best way to do this -- it all depends on what
 you need to learn , and probably several alternative displays will be
 necessary to get the important information the data have to convey.
 However,as always, this issue has been considered before, and it may be
 worthwhile to at least consider an already available standard approach
 using shingles and a trellis-type plot. ?xyplot and ?shingle should get you
 started (you probably want to shingle or bin on quantiles of y). The
 canonical reference is Bill Cleveland's VISUALIZING DATA (see coplots).


 Bert Gunter
 Genentech Nonclinical Statistics


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
 Behalf Of Jim Porzak
 Sent: Tuesday, October 02, 2007 11:19 AM
 To: Karin Lagesen
 Cc: r-help@r-project.org
 Subject: Re: [R] continuous boxplot?

 Karin,

 I like to use bagplots in these cases where there are a lot of cases and
 scatter plots become one big smudge.

 See
 http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/bagplot/bagplot.pdf

 And some further examples on slides 36 - 39 of
 http://www.porzak.com/JimArchive/JimPorzak_CIwithR_useR2006_tutorial.pdf

 --
 HTH,
 Jim Porzak
 Responsys, Inc.
 San Francisco, CA
 http://www.linkedin.com/in/jimporzak

 On 10/1/07, Karin Lagesen [EMAIL PROTECTED] wrote:
 
 
 
  I have two vectors x and y, which I would like to plot against each
  other. I am also displaying other data in this plot. However, I have
  about 1 million points to plot, and just plotting them x againt y is
  not very informative. What I'd like to do is to do sort of a
  continuous box plot.
 
  My x values goes from -1 to 1 and my y values from 0 to 1, so I4d like
  to plot the median and quantiles, and possibly also all of the
  outliers somehow. Are there any facilities in R for doing something
  like this, or would I need to do this the hard coded way?
 
  Thankyou very much for your help!
 
  Karin
  --
  Karin Lagesen, PhD student
  [EMAIL PROTECTED]
  http://folk.uio.no/karinlag
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange names when creating a data.frame: Difference between - and =

2007-10-02 Thread Prof Brian Ripley
On Tue, 2 Oct 2007, John Kane wrote:

 This is just a curiosity question.  Why do the two
 different syntaxes for df1 and df2 give such different
 results in the names(dfx)?

Because only in the second case did you specify the names in the call.
When you do nas - c(A ...) the argument has no name but you create an 
object called 'nas' in the calling environment (the workspace, it looks 
like).

Perhaps you are confused by thinking '=' is an assignment operator in R, 
but it is so only when it is not used in any other sense.


 Thanks


 df1 - data.frame(nas = c(A,  B , B ,C ,B,
 A, D),nums = c(3,2,1,   1,   2,3,  7))

 df2 - data.frame(nas  - c(A,  B , B ,C ,B,
 A, D),nums  - c(3,2,1,   1,   2,3,
 7))

 df1; df2

 df1
  nas nums
 1   A3
 2   B2
.   .

  df2

   nasc..ABBCBAD..
 numsc.3..2..1..1..2..3..7.
 1   A
3
 2   B

 3
 .   .

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique rows in data frame

2007-10-02 Thread John Kane
?aggregate

x - data.frame(a= as.factor(c(A,  B , B ,C
,B, A, D)),
 b = c(3,2,1,   1,   2,3,  7))

aggregate(x[,2], list(x[,1]), mean)


--- Dieter Best [EMAIL PROTECTED] wrote:

 Hello there, 

   I have a data frame a small version of which could
 look like the following:

 x1 x2
 1  A  1
 2  B  2
 3  B  3

   Now I need to remove rows which are duplicate in
 x1, i.e. in the example above I would remove row 3. 

   I have an ugly solution with for and while loops
 and ifs. ... And of course my data set is much
 larger and my solution takes along time. 

   Any ideas what could be the best way to do this in
 R?

   Better yet: I actually would like to sort of
 collapse row 2 and 3 in the example above by
 replacing 2 and 3 with a new row 2 which has in x2
 the mean of old x2 of row 2 and 3 (maybe this is
 poorly said).

   Anyways, thanks a lot in advance for suggestions.

   -- D
 
 

 -
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot question

2007-10-02 Thread Tiandao Li
Hello Eric,

I got it right this time. You are right, regression line is way too messy. 
For each gene, plot the lines connecting the points of the average 
intensity at different concentrations, and write gene ID at the end of 
line. I am new to the R graphics, any help is appreciated.


On Tue, 2 Oct 2007, Eric Thompson wrote:

But you did not use the command I suggested: you replaced x with
colnames(n), which is a vector of characters. Characters such as 25p
has little meaning for plotting:

 as.numeric(25p)
[1] NA

So you've tried to plot a bunch of NA's, which is why it doesn't know
what limits to use for xlim.

That is why I gave you the commands to convert that character array
colnames(n) to a meaningful numeric array. It is unreasonable to
expect R to guess that 25p = 25.

x - colnames(n)
x - as.numeric(substr(x, 1, nchar(x) - 1))
matplot(x, t(n), pch = 1, axes = FALSE)
axis(side = 2)
axis(side = 3, at = x, lab = colnames(n))
box()



On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
 However, I got the following msg.

  matplot(colnames(n), t(n), pch = 1, axes = FALSE)
 Error in plot.window(xlim, ylim, log, asp, ...) :
 need finite 'xlim' values
 In addition: Warning messages:
 1: NAs introduced by coercion in: as.double.default(x)
 2: no non-missing arguments to min; returning Inf in: min(x)
 3: no non-missing arguments to max; returning -Inf in: max(x)
 4: NAs introduced by coercion in: as.double.default(x)



 On Tue, 2 Oct 2007, Eric Thompson wrote:

 Okay. If you want to customize the axis labels, you can suppress the
 defaults by changing the matplot call to

 matplot(x, t(n), pch = 1, axes = FALSE)

 And then adding them how you want:

 axis(side = 2)
 axis(side = 3, at = x, lab = colnames(n))
 box()



 On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
  Thanks for your quick reply, Eric.
 
  I want plot colnames(n) as string on x-axis. If the regression lines don't
  fit the data very well, it is OK, the plot is only for quality check.
 
 
  On Tue, 2 Oct 2007, Eric Thompson wrote:
 
  If I've correctly interpreted what you want, you first need to get the x 
  values:
 
  x - colnames(n)
  x - as.numeric(substr(x, 1, nchar(x) - 1))
 
  Then it seems fairly easy to use matplot to get the values with
  different colors for each concentration
 
  dim(x) - c(length(x), 1)
  matplot(x, t(n), pch = 1)
 
  But this does not look like a simple line will fit the data for each
  gene well, so perhaps I've misunderstood something. You will have to
  decide how you want to do the regression. It will also get very messy
  and difficult to read with 20 lines (a different regression for each
  gene). To do the regressions, plot the lines, and label with the gene
  ID, see
 
  ?lm
  ?predict
  ?abline
  ?text
 
 
 
 
  On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
   Hello,
  
   I have a question about how to plot a series of data. The folloqing is my
   data matrix of n
n
25p5p  2.5p 0.5p
   16B-E06.g 45379  4383  5123   45
   16B-E06.g 45138  4028  6249   52
   16B-E06.g 48457  4267  5470   54
   16B-E06.g 47740  4676  6769   48
   37B-B02.g 42860  6152 19276   72
   35B-A02.g 48325 12863 38274  143
   35B-A02.g 48410 12806 39013  175
   35B-A02.g 48417  9057 40923  176
   35B-A02.g 51403 13865 43338  161
   45B-C12.g 50939  3656  5783   43
   45B-C12.g 52356  5524  6041   55
   45B-C12.g 49338  5141  5266   41
   45B-C12.g 51567  3915  5677   43
   35A-G04.g 40365  5513  6971   32
   35B-D01.g 54217 12607 13067   93
   35B-D01.g 55283 11441 14964  101
   35B-D01.g 55041  9626 14928   94
   35B-D01.g 54058  9465 14912   88
   35B-A04.g 42745 12080 34271  105
   35B-A04.g 41055 12423 34874  126
  
   colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
   Intensity. I want to plot the data this way.
   x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
   y-axis is Intensity
   Inside of plot is the points of intensity over 4 concentrations, points
   from different genes have different color or shape. A regression line of
   each genes crosss different concetrations, and at the end of line is gene
   IDs.
  
   Thanks,
  
   Tiandao
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, 

Re: [R] Strange names when creating a data.frame: Difference between - and =

2007-10-02 Thread Vladimir Eremeev



John Kane-2 wrote:
 
 This is just a curiosity question.  Why do the two
 different syntaxes for df1 and df2 give such different
 results in the names(dfx)?
 
 Thanks
 
 
 df1 - data.frame(nas = c(A,  B , B ,C ,B,
 A, D),nums = c(3,2,1,   1,   2,3,  7))
 
 df2 - data.frame(nas  - c(A,  B , B ,C ,B,
 A, D),nums  - c(3,2,1,   1,   2,3, 
 7))
 
 df1; df2
 
df1
   nas nums
 1   A3
 2   B2
 .   .
 
  df2
 
nasc..ABBCBAD..
 numsc.3..2..1..1..2..3..7.
 1   A 
 3
 2   B 
 
 3
 .   .
 
You specify argument names in the first function call in the form
agrument.name=value
Argument names become column names in the data frame.
- is the assignment operator, it returns the assigned value,  it is not
the specification of a function argument. 

Second function call does not contain argument names, so, they are
constructed from expressions.
You should have the variables nas and nums in your workspace now.
ls() should show them.
If you type data.frame (without parentheses, ( and )) on the R command
prompt it will print the body of this function. You can study it and
discover how it works.
-- 
View this message in context: 
http://www.nabble.com/Strange-names-when-creating-a-data.frame%3A-Difference-between-%3C--and-%3D-tf4557141.html#a13006205
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot question

2007-10-02 Thread hadley wickham
On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Hello,

 I have a question about how to plot a series of data. The folloqing is my
 data matrix of n
  n
  25p5p  2.5p 0.5p
 16B-E06.g 45379  4383  5123   45
 16B-E06.g 45138  4028  6249   52
 16B-E06.g 48457  4267  5470   54
 16B-E06.g 47740  4676  6769   48
 37B-B02.g 42860  6152 19276   72
 35B-A02.g 48325 12863 38274  143
 35B-A02.g 48410 12806 39013  175
 35B-A02.g 48417  9057 40923  176
 35B-A02.g 51403 13865 43338  161
 45B-C12.g 50939  3656  5783   43
 45B-C12.g 52356  5524  6041   55
 45B-C12.g 49338  5141  5266   41
 45B-C12.g 51567  3915  5677   43
 35A-G04.g 40365  5513  6971   32
 35B-D01.g 54217 12607 13067   93
 35B-D01.g 55283 11441 14964  101
 35B-D01.g 55041  9626 14928   94
 35B-D01.g 54058  9465 14912   88
 35B-A04.g 42745 12080 34271  105
 35B-A04.g 41055 12423 34874  126

 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
 Intensity. I want to plot the data this way.
 x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
 y-axis is Intensity
 Inside of plot is the points of intensity over 4 concentrations, points
 from different genes have different color or shape. A regression line of
 each genes crosss different concetrations, and at the end of line is gene
 IDs.

I might do it something like this:

df - structure(list(gene = structure(c(1L, 1L, 1L, 1L, 6L, 3L, 3L,
3L, 3L, 7L, 7L, 7L, 7L, 2L, 5L, 5L, 5L, 5L, 4L, 4L), .Label = c(16B-E06.g,
35A-G04.g, 35B-A02.g, 35B-A04.g, 35B-D01.g, 37B-B02.g,
45B-C12.g), class = factor), X25p = c(45379L, 45138L, 48457L,
47740L, 42860L, 48325L, 48410L, 48417L, 51403L, 50939L, 52356L,
49338L, 51567L, 40365L, 54217L, 55283L, 55041L, 54058L, 42745L,
41055L), X5p = c(4383L, 4028L, 4267L, 4676L, 6152L, 12863L, 12806L,
9057L, 13865L, 3656L, 5524L, 5141L, 3915L, 5513L, 12607L, 11441L,
9626L, 9465L, 12080L, 12423L), X2.5p = c(5123L, 6249L, 5470L,
6769L, 19276L, 38274L, 39013L, 40923L, 43338L, 5783L, 6041L,
5266L, 5677L, 6971L, 13067L, 14964L, 14928L, 14912L, 34271L,
34874L), X0.5p = c(45L, 52L, 54L, 48L, 72L, 143L, 175L, 176L,
161L, 43L, 55L, 41L, 43L, 32L, 93L, 101L, 94L, 88L, 105L, 126L
)), .Names = c(gene, X25p, X5p, X2.5p, X0.5p),
class = data.frame, row.names = c(NA, -20L))

library(reshape)
library(ggplot2)

dfm - melt(df, id=1)
names(dfm) - c(gene, conc, intensity)
dfm$conc - as.numeric(gsub([Xp], , as.character(dfm$conc)))

qplot(conc, intensity, data=dfm, colour=gene, log=xy) + geom_smooth(method=lm)

Note that I've converted the concentrations to numeric values and
plotted them on a log scale.  If you want to treat concentration as a
factor, then you'll need the following code:

dfm$conc - factor(dfm$conc)
qplot(conc, intensity, data=dfm, colour=gene, group=gene, log=y) +
geom_smooth(method=lm, xseq=levels(dfm$conc))

But in that case, fitting a linear model seems a bit dubious.

Note that you can also use this format of data with lattice:

library(lattice)
xyplot(intensity ~ conc, data=dfm, type=c(p,r), group=gene, auto.key=T)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Trouble obtaining results from a loop

2007-10-02 Thread Luke Neraas
#Hello,
#I have a question about obtaining results from a loop I have written.
#Below is a sample of individual genotypes from a genetic question I am
working on called P.genotype.sample .

P.genotype.sample-matrix(10,10,10)
P.genotype.sample[,1]-c(2,2,1,5,1,1,5,6,1,3)
P.genotype.sample[,2]-c(6,3,3,6,8,1,6,7,2,3)
P.genotype.sample[,3]-c(2,2,2,3,3,2,2,2,3,3)
P.genotype.sample[,4]-c(2,8,8,3,8,2,8,3,4,3)
P.genotype.sample[,5]-c(3,3,8,3,6,1,1,1,1,3)
P.genotype.sample[,6]-c(6,3,8,8,6,8,7,3,1,7)
P.genotype.sample[,7]-c(1,5,5,1,5,1,1,5,5,5)
P.genotype.sample[,8]-c(5,5,5,5,7,6,7,5,5,8)
P.genotype.sample[,9]-c(5,5,8,5,5,5,5,2,5,2)
P.genotype.sample[,10]-c(5,8,8,8,8,5,8,5,8,2)
P.genotype.sample

# I would like to count the number of times the same number ( 1 through 8)
occurs in each row
# and only in pairs of columns; 1:2, 3:4, 5:6, 7:8,  9:10.  below is the
loop that i have written
# to perform this operation and put in the results in a matrix that is 5
columns wide and 8 rows long (called result).
# I can get the operation to run but it gives me the results for the last
pair of columns and overwrites
# the results into every column of my result matrix. I am puzzled as to why
this is occurring.

Loci-5
Locus.Columns- Loci*2

x-seq(1,Locus.Columns,2)

result-matrix(8,8,Loci)


for (i in 1:Loci){
for (j in 1:8){
for (k in x){
result[j,i] - sum(P.genotype.sample[,k]==j  P.genotype.sample[,k+1]==j)

}
}
}

result


# Below is a long and drawn out way to achieve the result I am looking for
in the loop above.

true.result-matrix(8,8,Loci)

for(j in 1:8){
true.result[j,1] -sum(P.genotype.sample[,1]==j  P.genotype.sample[,2]==j)
}

for(j in 1:8){
true.result[j,2] -sum(P.genotype.sample[,3]==j  P.genotype.sample[,4]==j)
}

for(j in 1:8){
true.result[j,3] -sum(P.genotype.sample[,5]==j  P.genotype.sample[,6]==j)
}

for(j in 1:8){
true.result[j,4] -sum(P.genotype.sample[,7]==j  P.genotype.sample[,8]==j)
}

for(j in 1:8){
true.result[j,5] -sum(P.genotype.sample[,9]==j  P.genotype.sample[,10]==j)
}

true.result

# Any suggestions or help on how to make this loop work would be greatly
appreciated.

# Thanks in advance


Luke Neraas

[EMAIL PROTECTED]

University of Alaska Fairbanks
School of Fisheries and Ocean Sciences
11120 Glacier Highway
UAF Fisheries Division
Juneau, AK 99801

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot question

2007-10-02 Thread Tiandao Li

Thanks for your help, Hadley. I want to treat concentration as factor, and 
the 2nd and 3rd part of codes are what I wanted. However, how to draw the 
lines to connect the points of average intensity of each gene at different 
concentrations?

On Tue, 2 Oct 2007, hadley wickham wrote:

On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Hello,

 I have a question about how to plot a series of data. The folloqing is my
 data matrix of n
  n
  25p5p  2.5p 0.5p
 16B-E06.g 45379  4383  5123   45
 16B-E06.g 45138  4028  6249   52
 16B-E06.g 48457  4267  5470   54

 colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
 Intensity. I want to plot the data this way.
 x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
 y-axis is Intensity
 Inside of plot is the points of intensity over 4 concentrations, points
 from different genes have different color or shape. A regression line of
 each genes crosss different concetrations, and at the end of line is gene
 IDs.

I might do it something like this:

df - structure(list(gene = structure(c(1L, 1L, 1L, 1L, 6L, 3L, 3L,
3L, 3L, 7L, 7L, 7L, 7L, 2L, 5L, 5L, 5L, 5L, 4L, 4L), .Label = c(16B-E06.g,
35A-G04.g, 35B-A02.g, 35B-A04.g, 35B-D01.g, 37B-B02.g,
45B-C12.g), class = factor), X25p = c(45379L, 45138L, 48457L,
47740L, 42860L, 48325L, 48410L, 48417L, 51403L, 50939L, 52356L,
49338L, 51567L, 40365L, 54217L, 55283L, 55041L, 54058L, 42745L,
41055L), X5p = c(4383L, 4028L, 4267L, 4676L, 6152L, 12863L, 12806L,
9057L, 13865L, 3656L, 5524L, 5141L, 3915L, 5513L, 12607L, 11441L,
9626L, 9465L, 12080L, 12423L), X2.5p = c(5123L, 6249L, 5470L,
6769L, 19276L, 38274L, 39013L, 40923L, 43338L, 5783L, 6041L,
5266L, 5677L, 6971L, 13067L, 14964L, 14928L, 14912L, 34271L,
34874L), X0.5p = c(45L, 52L, 54L, 48L, 72L, 143L, 175L, 176L,
161L, 43L, 55L, 41L, 43L, 32L, 93L, 101L, 94L, 88L, 105L, 126L
)), .Names = c(gene, X25p, X5p, X2.5p, X0.5p),
class = data.frame, row.names = c(NA, -20L))

library(reshape)
library(ggplot2)

dfm - melt(df, id=1)
names(dfm) - c(gene, conc, intensity)
dfm$conc - as.numeric(gsub([Xp], , as.character(dfm$conc)))

qplot(conc, intensity, data=dfm, colour=gene, log=xy) + geom_smooth(method=lm)

Note that I've converted the concentrations to numeric values and
plotted them on a log scale.  If you want to treat concentration as a
factor, then you'll need the following code:

dfm$conc - factor(dfm$conc)
qplot(conc, intensity, data=dfm, colour=gene, group=gene, log=y) +
geom_smooth(method=lm, xseq=levels(dfm$conc))

But in that case, fitting a linear model seems a bit dubious.

Note that you can also use this format of data with lattice:

library(lattice)
xyplot(intensity ~ conc, data=dfm, type=c(p,r), group=gene, auto.key=T)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] continuous boxplot?

2007-10-02 Thread Deepayan Sarkar
On 10/2/07, Bert Gunter [EMAIL PROTECTED] wrote:


 Folks:

 I found the references in the previous replies to this vexing data
 visualization issue to be quite interesting and useful. I think it fair to
 say that there is no single best way to do this -- it all depends on what
 you need to learn , and probably several alternative displays will be
 necessary to get the important information the data have to convey.
 However,as always, this issue has been considered before, and it may be
 worthwhile to at least consider an already available standard approach
 using shingles and a trellis-type plot. ?xyplot and ?shingle should get you
 started (you probably want to shingle or bin on quantiles of y). The
 canonical reference is Bill Cleveland's VISUALIZING DATA (see coplots).

A better reference for that is Elements of Graphing Data, where
boxplots created after discretizing the x variable is used to motivate
lowess. It's pretty much the only legitimate use I have seen of
boxplots with shingles (it's a cheap way to get both a sense of
location and variability non-parametrically). I vaguely remember
seeing an even better reference, but can't find it now.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot question

2007-10-02 Thread hadley wickham
The last example does connect the points of average intensity, doesn't it?

Hadley

On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:

 Thanks for your help, Hadley. I want to treat concentration as factor, and
 the 2nd and 3rd part of codes are what I wanted. However, how to draw the
 lines to connect the points of average intensity of each gene at different
 concentrations?

 On Tue, 2 Oct 2007, hadley wickham wrote:

 On 10/2/07, Tiandao Li [EMAIL PROTECTED] wrote:
  Hello,
 
  I have a question about how to plot a series of data. The folloqing is my
  data matrix of n
   n
   25p5p  2.5p 0.5p
  16B-E06.g 45379  4383  5123   45
  16B-E06.g 45138  4028  6249   52
  16B-E06.g 48457  4267  5470   54
 
  colnames(n) is concentrations, rownames(n) is gene IDs, and the rest is
  Intensity. I want to plot the data this way.
  x-axis is colnames(n) in the order of 0.5p, 2.5p,5p,and 25p.
  y-axis is Intensity
  Inside of plot is the points of intensity over 4 concentrations, points
  from different genes have different color or shape. A regression line of
  each genes crosss different concetrations, and at the end of line is gene
  IDs.

 I might do it something like this:

 df - structure(list(gene = structure(c(1L, 1L, 1L, 1L, 6L, 3L, 3L,
 3L, 3L, 7L, 7L, 7L, 7L, 2L, 5L, 5L, 5L, 5L, 4L, 4L), .Label = c(16B-E06.g,
 35A-G04.g, 35B-A02.g, 35B-A04.g, 35B-D01.g, 37B-B02.g,
 45B-C12.g), class = factor), X25p = c(45379L, 45138L, 48457L,
 47740L, 42860L, 48325L, 48410L, 48417L, 51403L, 50939L, 52356L,
 49338L, 51567L, 40365L, 54217L, 55283L, 55041L, 54058L, 42745L,
 41055L), X5p = c(4383L, 4028L, 4267L, 4676L, 6152L, 12863L, 12806L,
 9057L, 13865L, 3656L, 5524L, 5141L, 3915L, 5513L, 12607L, 11441L,
 9626L, 9465L, 12080L, 12423L), X2.5p = c(5123L, 6249L, 5470L,
 6769L, 19276L, 38274L, 39013L, 40923L, 43338L, 5783L, 6041L,
 5266L, 5677L, 6971L, 13067L, 14964L, 14928L, 14912L, 34271L,
 34874L), X0.5p = c(45L, 52L, 54L, 48L, 72L, 143L, 175L, 176L,
 161L, 43L, 55L, 41L, 43L, 32L, 93L, 101L, 94L, 88L, 105L, 126L
 )), .Names = c(gene, X25p, X5p, X2.5p, X0.5p),
 class = data.frame, row.names = c(NA, -20L))

 library(reshape)
 library(ggplot2)

 dfm - melt(df, id=1)
 names(dfm) - c(gene, conc, intensity)
 dfm$conc - as.numeric(gsub([Xp], , as.character(dfm$conc)))

 qplot(conc, intensity, data=dfm, colour=gene, log=xy) + 
 geom_smooth(method=lm)

 Note that I've converted the concentrations to numeric values and
 plotted them on a log scale.  If you want to treat concentration as a
 factor, then you'll need the following code:

 dfm$conc - factor(dfm$conc)
 qplot(conc, intensity, data=dfm, colour=gene, group=gene, log=y) +
 geom_smooth(method=lm, xseq=levels(dfm$conc))

 But in that case, fitting a linear model seems a bit dubious.

 Note that you can also use this format of data with lattice:

 library(lattice)
 xyplot(intensity ~ conc, data=dfm, type=c(p,r), group=gene, auto.key=T)

 Hadley

 --
 http://had.co.nz/



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique rows in data frame

2007-10-02 Thread Dieter Best
Hi guys,
   
  thanks for all your help. I didn't know about aggregate yet. 
  

John Kane [EMAIL PROTECTED] wrote:
  ?aggregate

x - data.frame(a= as.factor(c(A, B , B ,C
,B, A, D)),
b = c(3, 2, 1, 1, 2, 3, 7))

aggregate(x[,2], list(x[,1]), mean)


--- Dieter Best wrote:

 Hello there, 
 
 I have a data frame a small version of which could
 look like the following:
 
 x1 x2
 1 A 1
 2 B 2
 3 B 3
 
 Now I need to remove rows which are duplicate in
 x1, i.e. in the example above I would remove row 3. 
 
 I have an ugly solution with for and while loops
 and ifs. ... And of course my data set is much
 larger and my solution takes along time. 
 
 Any ideas what could be the best way to do this in
 R?
 
 Better yet: I actually would like to sort of
 collapse row 2 and 3 in the example above by
 replacing 2 and 3 with a new row 2 which has in x2
 the mean of old x2 of row 2 and 3 (maybe this is
 poorly said).
 
 Anyways, thanks a lot in advance for suggestions.
 
 -- D
 
 
 
 -
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 






   
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble obtaining results from a loop

2007-10-02 Thread jim holtman
I think this gives you what you are looking for:

 P.genotype.sample
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]262236155 5
 [2,]232833555 8
 [3,]132888558 8
 [4,]563338155 8
 [5,]183866575 8
 [6,]112218165 5
 [7,]562817175 8
 [8,]672313552 5
 [9,]123411555 8
[10,]333337582 2

 # create result matrix
 result - matrix(0, ncol=5, nrow=8)

 # iterate for pairs of columns
 for (i in seq(1, ncol(P.genotype.sample), by=2)){
+ # get the rows that match
+ match.row - P.genotype.sample[,i] == P.genotype.sample[, i + 1]
+ # get the values from the rows that match
+ same - P.genotype.sample[match.row, i]
+ # now increment the 'result' matrix
+ for (j in same) result[j, (i + 1) / 2] - result[j, (i + 1) / 2] + 1
+ }
 result
 [,1] [,2] [,3] [,4] [,5]
[1,]10100
[2,]02001
[3,]12100
[4,]00000
[5,]00042
[6,]00100
[7,]00000
[8,]00101



On 10/2/07, Luke Neraas [EMAIL PROTECTED] wrote:
 #Hello,
 #I have a question about obtaining results from a loop I have written.
 #Below is a sample of individual genotypes from a genetic question I am
 working on called P.genotype.sample .

 P.genotype.sample-matrix(10,10,10)
 P.genotype.sample[,1]-c(2,2,1,5,1,1,5,6,1,3)
 P.genotype.sample[,2]-c(6,3,3,6,8,1,6,7,2,3)
 P.genotype.sample[,3]-c(2,2,2,3,3,2,2,2,3,3)
 P.genotype.sample[,4]-c(2,8,8,3,8,2,8,3,4,3)
 P.genotype.sample[,5]-c(3,3,8,3,6,1,1,1,1,3)
 P.genotype.sample[,6]-c(6,3,8,8,6,8,7,3,1,7)
 P.genotype.sample[,7]-c(1,5,5,1,5,1,1,5,5,5)
 P.genotype.sample[,8]-c(5,5,5,5,7,6,7,5,5,8)
 P.genotype.sample[,9]-c(5,5,8,5,5,5,5,2,5,2)
 P.genotype.sample[,10]-c(5,8,8,8,8,5,8,5,8,2)
 P.genotype.sample

 # I would like to count the number of times the same number ( 1 through 8)
 occurs in each row
 # and only in pairs of columns; 1:2, 3:4, 5:6, 7:8,  9:10.  below is the
 loop that i have written
 # to perform this operation and put in the results in a matrix that is 5
 columns wide and 8 rows long (called result).
 # I can get the operation to run but it gives me the results for the last
 pair of columns and overwrites
 # the results into every column of my result matrix. I am puzzled as to why
 this is occurring.

 Loci-5
 Locus.Columns- Loci*2

 x-seq(1,Locus.Columns,2)

 result-matrix(8,8,Loci)


 for (i in 1:Loci){
for (j in 1:8){
for (k in x){
result[j,i] - sum(P.genotype.sample[,k]==j  P.genotype.sample[,k+1]==j)

}
}
}

 result


 # Below is a long and drawn out way to achieve the result I am looking for
 in the loop above.

 true.result-matrix(8,8,Loci)

for(j in 1:8){
 true.result[j,1] -sum(P.genotype.sample[,1]==j  P.genotype.sample[,2]==j)
}

for(j in 1:8){
 true.result[j,2] -sum(P.genotype.sample[,3]==j  P.genotype.sample[,4]==j)
}

for(j in 1:8){
 true.result[j,3] -sum(P.genotype.sample[,5]==j  P.genotype.sample[,6]==j)
}

for(j in 1:8){
 true.result[j,4] -sum(P.genotype.sample[,7]==j  P.genotype.sample[,8]==j)
}

for(j in 1:8){
 true.result[j,5] -sum(P.genotype.sample[,9]==j  P.genotype.sample[,10]==j)
}

 true.result

 # Any suggestions or help on how to make this loop work would be greatly
 appreciated.

 # Thanks in advance


 Luke Neraas

 [EMAIL PROTECTED]

 University of Alaska Fairbanks
 School of Fisheries and Ocean Sciences
 11120 Glacier Highway
 UAF Fisheries Division
 Juneau, AK 99801

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Protect your brand online

2007-10-02 Thread London4Jobs
This is a text part of the message.
It is shown for the users of old-style e-mail clients

__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R routines vs. MATLAB/SPSS Routines

2007-10-02 Thread Matthew Dubins
Hi all,

I've become quite enamored of R lately, and have decided to try to teach 
some of its basics (reading in data, manipulation and classical stats 
analyses) to my fellow grad students at the University of Toronto.  I 
sent out a mass email and have already received some positive 
responses.  One student, however, wanted to know what differentiates the 
routines that R uses, from those that MATLAB and SPSS use.  In other 
words, in what respects do R routines work faster/more efficiently/more 
accurately than those of MATLAB/SPSS. 

I thank you in advance for any answer you can give me (or rather, the 
inquiring student).

Cheers,
Matthew Dubins

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GARCH simulations

2007-10-02 Thread twickra
Hey,

Is there any way to simulate a GARCH(1,1) data set?

Thanks
Tharanga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating proportions from a data frame rather than a table

2007-10-02 Thread Farrel Buchinsky
How do you create a table from a data frame? I tried as.table(
name.of.data.frame) but it bombed out.
I will include the exact error message in my next posting. If I recall
correctly, it said that the data.frame could not be coerced to a table.


On 10/2/07, John Kane [EMAIL PROTECTED] wrote:

 What am I missing here?

 Cannot you just create the table from the data.frame
 and apply prop.table()to it?

 --- Farrel Buchinsky [EMAIL PROTECTED] wrote:

  When one has raw data it is easy to create a table
  of one variable against
  another and then calculate proportions
  For example
  a.nice.table-table(a,b)
  prop.table(a.nice.table,1)
 
  However, I looked at several papers and created a
  data frame of the
  aggregate data. That means I acually created a table
  except it is a data
  frame. The first column lists the name of the first
  author and the year.
  I cannot find how to convert the data frame to a
  table so I can use great
  functions such as prop.table and margin.table.
 
  Alas I tried, rowSums but it provides lousy output
  without listing the names
  of th papers.
  I have thought of going through the reshape package
  but I suspect that there
  is an easier way to convert a data frame to a table.
  Is there?
 
  --
  Farrel Buchinsky



  Be smarter than spam. See how smart SpamGuard is at giving junk email
 the boot with the All-new Yahoo! Mail at
 http://mrd.mail.yahoo.com/try_beta?.intl=ca




-- 
Farrel Buchinsky
GrandCentral Tel: (412) 567-7870

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Design package: plot summary

2007-10-02 Thread Frank E Harrell Jr
Cary Dehing-Oberije wrote:
 Hi everybody,
 
 I am a new user of R, design package.
 I am trying to plot the estimated hazard ratio's of my cox regression
 model with the confidence intervals. But I keep getting the
 message:Error in `contrasts-`(`*tmp*`, value = contr.treatment) : 
 contrasts can be applied only to factors with 2 or more levels
 
 I have dichotomous, categorical as well as continuous variables in the
 model, but I can only plot the dichotomous variable.
 
 I started with the following syntax:
 
 dd - datadist(x1, x2, x3)

Please follow the posting guide.  Please provide R commands that 
generate values of x1, x2, x3 that cause the problem you reported below.

 options(datadist='dd')
 
 fit-cph(S~ x1 +x2 +x3, x=T,y=T)
 
 plothr- summary(fit)
 plot(plothr, log=T)

or just plot(summary(fit), log=TRUE)

 
 Because the levels defined by datadist didn't seem to work I tried to
 adjust the levels after fitting the model:
 
 dd$limits[Adjust to,x1] - 1 
 dd$limits[Adjust to,x2] - 30

Don't do that.

Frank


 fit - update(fit)
 plothr- summary(fit)
 plot(plothr, log=T)
 
 Finally I have tried: summary(fit, x1=1, x2=30, x3=3)
 
 But this option didn't work either. Can anybody tell me what goes wrong
 and why?
 Thanks for your help,
 
 Cary
 
 Cary Dehing-Oberije
 Researcher CAT project
 p/a MAASTRO clinic
 Dr. Tanslaan 12
 6229 ET Maastricht
 The Netherlands
 phone: 088 - 4455666 / 088 - 4455767
 email: [EMAIL PROTECTED] 
 
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique rows in data frame

2007-10-02 Thread Gabor Grothendieck
Try this:

 DF[!duplicated(DF$x1), ]
  x1 x2
1  A  1
2  B  2
 # or
 subset(DF, !duplicated(x1))
  x1 x2
1  A  1
2  B  2


On 10/2/07, Dieter Best [EMAIL PROTECTED] wrote:
 Hello there,

  I have a data frame a small version of which could look like the following:

x1 x2
 1  A  1
 2  B  2
 3  B  3

  Now I need to remove rows which are duplicate in x1, i.e. in the example 
 above I would remove row 3.

  I have an ugly solution with for and while loops and ifs. ... And of course 
 my data set is much larger and my solution takes along time.

  Any ideas what could be the best way to do this in R?

  Better yet: I actually would like to sort of collapse row 2 and 3 in the 
 example above by replacing 2 and 3 with a new row 2 which has in x2 the mean 
 of old x2 of row 2 and 3 (maybe this is poorly said).

  Anyways, thanks a lot in advance for suggestions.

  -- D



 -

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R routines vs. MATLAB/SPSS Routines

2007-10-02 Thread Katharine Mullen
I would stress the advantages of the free and open source nature of R over
the proprietary programs you mention. Because R is free (as in beer), your
student will have access to it even when they are free of the university
that I presume buys a MATLAB/SPSS license for them. And because R is open
source, when your student has an idea regarding how to make R work
faster/more efficiently/more accurately, they can modify the source code
at whatever level they like.  They can also examine at whatever level they
like what R is doing, so there are no black boxes involved.

On Tue, 2 Oct 2007, Matthew Dubins wrote:

 Hi all,

 I've become quite enamored of R lately, and have decided to try to teach
 some of its basics (reading in data, manipulation and classical stats
 analyses) to my fellow grad students at the University of Toronto.  I
 sent out a mass email and have already received some positive
 responses.  One student, however, wanted to know what differentiates the
 routines that R uses, from those that MATLAB and SPSS use.  In other
 words, in what respects do R routines work faster/more efficiently/more
 accurately than those of MATLAB/SPSS.

 I thank you in advance for any answer you can give me (or rather, the
 inquiring student).

 Cheers,
 Matthew Dubins

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to modify function for list

2007-10-02 Thread Chung-hong Chan
Dear R Gurus,

I have a function which calculate the BMI standard deviation score
based on the age and gender specific L, M, S value.
I have written a function like this

bmisds - function (age, gender, bmi){
  if (age ==1  gender ==1)
{
  bmif - c(-1.013,16.133,0.07656)
}
  else if (age ==1  gender ==0)
{
  bmif - c(-1.79,16.421,0.07149)
}
  else if (age == 2  gender == 1)
{
  bmif - c(-1.403,15.58,0.07342)
}
  else if (age == 2  gender == 0)
{
  bmif - c(-2.045,15.861,0.07116)
}


  (((bmi/bmif[2])**bmif[1] -1)/(bmif[1]*bmif[3]))
}

This function work when I supply a single variable, e.g. bmisds(1,0,16)

When I try it with a list, e.g.
age - c(12,13,14)
gender - c(0,1,1)
bmi - c(14,15,16)
bmisds(age,gender,bmi)

This function give me wrong answer and error msg.
Warning messages:
1: the condition has length  1 and only the first element will be
used in: if (age == 1  gender == 1) {


How can I midify this function to work with list/vector?

regards,
C


-- 
CH Chan
Research Assistant - KWH
http://www.macgrass.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Factor levels.

2007-10-02 Thread Rolf Turner

I have factors with levels ``Unit, Achieved, and Scholarship; I  
wish to replace these with
U, A, and S.

So I do

fff - factor(fff,labels=c(U,A,S))

This works as long as all of the levels are actually present in the  
factor.  But if ``Scholarship'' is absent
(as if often is) then I get an error.

I can do a workaround such as

fff - factor(c(U,A,S)[fff],levels=c(U,A,S))

but this seems kludgy to me.

Is there a sexier way?

cheers,

Rolf Turner


##
Attention:\ This e-mail message is privileged and confidenti...{{dropped}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.