from:"Steven McKinney"

Re: [R] lmer coefficient distributions and p values

2007-08-15 Thread Steven McKinney


Hi Daniel,

You might want to review further advances by Doug Bates with lme4
since the post you show in your email.

http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3565.html
In this thread Doug Bates discusses fitting using maximum
likelihood for testing purposes.  There is now an anova()
method for lmer() and lmer2() fits performed using method=ML.  
You can compare different models and get p-values for 
p-value obsessed journals using this approach.

I believe I read a thread discussing the use of maximum
likelihood fitting for use in ANOVA tests, and then 
REML fits on final models for better parameter estimates - 
but I can't find that thread.  Hopefully Doug Bates
or anyone involved in that level of discussion can chime in.


See also for example this wiki 
http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests

and the new listserv at
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


Also check the latest documentation for lme4 and the lmer()
and lmer2() functions at
http://cran.r-project.org/
in the 
Packages ... lme4 
pages.


Hope this helps

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Daniel Lakeland
Sent: Wed 8/15/2007 9:34 AM
To: r-help@stat.math.ethz.ch
Subject: [R] lmer coefficient distributions and p values
 
I am helping my wife do some statistical analysis. She is a biologist,
and she has performed some measurements on various genotypes of
mice. My background is in applied mathematics and engineering, and I
have a fairly good statistics background, but I am by no means a PhD
level expert in statistical methods.

We have used the lmer package to fit various models for the various
experiments that she has done (random effects from multiple
measurements for each animal or each trial, and fixed effects from
developmental stage, and genotype etc). The results are fairly clear
cut to me, and I suggested that she publish the results as coefficient
estimates for the relevant contrasts, and their standard error
estimates. However, she has read the statistical guidelines for the
journal and they insist on p values.

I personally think that p values, and sharp-null hypothesis tests are
misguided and should be banned from publications, but it doesn't much
matter what I think compared to what the editors want.

Based on searching the archives, and finding this message:

https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html

I am aware of the theoretical difficulties with p values from lmer
results. I am also aware of the mcmcsamp function which performs some
kind of bayesian sampling from the posterior distribution of the
coefficients based on some kind of prior (I will need to do some more
reading to more fully understand this). Is this the primary way in
which we can estimate the distribution of the model coefficients and
calculate a p value or a confidence interval?

What can I do with the t statistic provided by lmer? If as the above
message suggests, we are uncertain of the correct F and by extension t
distributions to use, what help are the t statistics? I suppose I
could test them against a very low degree of freedom t distribution
(say 3) and publish those p values?

Again, I'm content to ignore p values and stick to estimates, but the
journal isn't.

BTW: thanks to all on this list, I've benefitted greatly from R and
from the archives of help topics.

-- 
Daniel Lakeland
[EMAIL PROTECTED]
http://www.street-artists.org/~dlakelan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Systematically biased count data regression model

2007-08-09 Thread Steven McKinney

Hi Matthew,

You may be experiencing the classic
'regression towards the mean' phenomenon,
in which case shrinkage estimation may help
with prediction (extremely low and high values
need to be shrunk back towards the mean)

Here's a reference that discusses the issue 
in a manner somewhat related to your situation,
and it has plenty of good references


Application of Shrinkage Techniques in Logistic Regression Analysis: A Case 
Study

E. W. Steyerberg

Statistica Neerlandica, 2001, vol. 55, issue 1, pages 76-88 




Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Matthew and Kim Bowser
Sent: Thu 8/9/2007 8:43 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Systematically biased count data regression model
 
Dear all,

I am attempting to explain patterns of arthropod family richness
(count data) using a regression model.  It seems to be able to do a
pretty good job as an explanatory model (i.e. demonstrating
relationships between dependent and independent variables), but it has
systematic problems as a predictive model:  It is biased high at low
observed values of family richness and biased low at high observed
values of family richness (see attached pdf).  I have tried diverse
kinds of reasonable regression models mostly as in Zeileis, et al.
(2007), as well as transforming my variables, both with only small
improvements.

Do you have suggestions for making a model that would perform better
as a predictive model?

Thank you for your time.

Sincerely,

Matthew Bowser

STEP student
USFWS Kenai National Wildlife Refuge
Soldotna, Alaska, USA

M.Sc. student
University of Alaska Fairbanks
Fairbankse, Alaska, USA

Reference

Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
count data in R. Technical Report 53, Department of Statistics and
Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

Code

`data` -
structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232,
0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318

Re: [R] FW: Selecting undefined column of a data frame (was[BioC]read.phenoData vs read.AnnotatedDataFrame)

2007-08-04 Thread Steven McKinney

Resolution:

To avoid bugs in code due to typos of data frame 
column names that can occur when using the 
'$' extractor, 

   foo - data.frame(Filename = c(a, b))
   foo$FileName
  NULL

a past alternative was to use 
foo[, FileName]
instead of
foo$FileName.

However, this too now silently returns NULL.

 foo[, FileName]
NULL

A modest and simple modification is to use
TRUE for the row index argument.

   foo[T, FileName]
  Error in `[.data.frame`(foo, T, FileName) : 
  undefined columns selected

An error is issued, and the misspelled column name
can more easily be found in debugging the issue.

   all.equal(foo$Filename, foo[T, Filename])
  [1] TRUE

The two accessor methods yield the same result
when column names are spelled correctly.

   all.equal(iris$Species, iris[T, Species])
  [1] TRUE

Other solutions no doubt exist.  Currently a single
argument to [.data.frame will throw an error
if the argument does not match a column name.

   foo[FileName]
  Error in `[.data.frame`(foo, FileName) : 
  undefined columns selected


 sessionInfo()
R version 2.5.1 (2007-06-27) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods  
[7] base 
 



Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Steven McKinney
Sent: Fri 8/3/2007 11:10 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] FW: Selecting undefined column of a data frame 
(was[BioC]read.phenoData vs read.AnnotatedDataFrame)
 


I see now that for my example


 foo - data.frame(Filename = c(a, b))
 foo[, FileName]
NULL

the issue is in this clause of the 
[.data.frame extractor.

The lines
if (drop  length(y) == 1L) 
return(.subset2(y, 1L)) 
return the NULL result just before the
error check
cols - names(y)
if (any(is.na(cols))) 
stop(undefined columns selected)
is performed.

Is this intended behaviour, or has a logical
bug crept into the [.data.frame extractor?


if (missing(i)) {
if (missing(j)  drop  length(x) == 1L) 
return(.subset2(x, 1L))
y - if (missing(j)) 
x
else .subset(x, j)
if (drop  length(y) == 1L) 
return(.subset2(y, 1L)) ## This returns a result before undefined 
columns check is done.  Is this intended?
cols - names(y)
if (any(is.na(cols))) 
stop(undefined columns selected)
if (any(duplicated(cols))) 
names(y) - make.unique(cols)
nrow - .row_names_info(x, 2L)
if (drop  !mdrop  nrow == 1L) 
return(structure(y, class = NULL, row.names = NULL))
else return(structure(y, class = oldClass(x), row.names = 
.row_names_info(x, 
0L)))
}




 sessionInfo()
R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   
base 

other attached packages:
 plotrix lme4   Matrix  lattice 
 2.2-3  0.99875-4 0.999375-0 0.16-2 

Should this discussion move to R-devel?

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Steven McKinney
Sent: Fri 8/3/2007 10:37 AM
To: r-help@stat.math.ethz.ch
Subject: [R] FW: Selecting undefined column of a data frame (was 
[BioC]read.phenoData vs read.AnnotatedDataFrame)
 
Hi all,

What are current methods people use in R to identify
mis-spelled column names when selecting columns
from a data frame?

Alice Johnson recently tackled this issue
(see [BioC] posting below).

Due to a mis-spelled column name (FileName
instead of Filename) which produced no warning,
Alice spent a fair amount of time tracking down
this bug.  With my fumbling fingers I'll be tracking
down such a bug soon too.

Is there any options() setting, or debug technique
that will flag data frame column extractions that
reference a non-existent column?  It seems to me
that the [.data.frame extractor used to throw an
error if given a mis-spelled variable name, and I
still see lines of code in [.data.frame such as

if (any(is.na(cols))) 
stop(undefined columns selected)



In R 2.5.1 a NULL is silently returned.

 foo - data.frame(Filename = c(a, b))
 foo

[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

2007-08-03 Thread Steven McKinney

Hi all,

What are current methods people use in R to identify
mis-spelled column names when selecting columns
from a data frame?

Alice Johnson recently tackled this issue
(see [BioC] posting below).

Due to a mis-spelled column name (FileName
instead of Filename) which produced no warning,
Alice spent a fair amount of time tracking down
this bug.  With my fumbling fingers I'll be tracking
down such a bug soon too.

Is there any options() setting, or debug technique
that will flag data frame column extractions that
reference a non-existent column?  It seems to me
that the [.data.frame extractor used to throw an
error if given a mis-spelled variable name, and I
still see lines of code in [.data.frame such as

if (any(is.na(cols))) 
stop(undefined columns selected)



In R 2.5.1 a NULL is silently returned.

 foo - data.frame(Filename = c(a, b))
 foo[, FileName]
NULL

Has something changed so that the code lines
if (any(is.na(cols))) 
stop(undefined columns selected)
in [.data.frame no longer work properly (if
I am understanding the intention properly)?

If not, could  [.data.frame check an
options() variable setting (say
warn.undefined.colnames) and throw a warning
if a non-existent column name is referenced?




 sessionInfo()
R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   
base 

other attached packages:
 plotrix lme4   Matrix  lattice 
 2.2-3  0.99875-4 0.999375-0 0.16-2 
 



Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Johnstone, Alice
Sent: Wed 8/1/2007 7:20 PM
To: [EMAIL PROTECTED]
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
 
 For interest sake, I have found out why I wasn't getting my expected
results when using read.AnnotatedDataFrame
Turns out the error was made in the ReadAffy command, where I specified
the filenames to be read from my AnnotatedDataFrame object.  There was a
typo error with a capital N ($FileName) rather than lowercase n
($Filename) as in my target file..whoops.  However this meant the
filename argument was ignored without the error message(!) and instead
of using the information in the AnnotatedDataFrame object (which
included filenames, but not alphabetically) it read the .cel files in
alphabetical order from the working directory - hence the wrong file was
given the wrong label (given by the order of Annotated object) and my
comparisons were confused without being obvious as to why or where.
Our solution: specify that filename is as.character so assignment of
file to target is correct(after correcting $Filename) now that using
read.AnnotatedDataFrame rather than readphenoData.

Data-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd)

Hurrah!

It may be beneficial to others, that if the filename argument isn't
specified, that filenames are read from the phenoData object if included
here.

Thanks!

-Original Message-
From: Martin Morgan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 26 July 2007 11:49 a.m.
To: Johnstone, Alice
Cc: [EMAIL PROTECTED]
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame

Hi Alice --

Johnstone, Alice [EMAIL PROTECTED] writes:

 Using R2.5.0 and Bioconductor I have been following code to analysis 
 Affymetrix expression data: 2 treatments vs control.  The original 
 code was run last year and used the read.phenoData command, however 
 with the newer version I get the error message Warning messages:
 read.phenoData is deprecated, use read.AnnotatedDataFrame instead The 
 phenoData class is deprecated, use AnnotatedDataFrame (with
 ExpressionSet) instead
  
 I use the read.AnnotatedDataFrame command, but when it comes to the 
 end of the analysis the comparison of the treatment to the controls 
 gets mixed up compared to what you get using the original 
 read.phenoData ie it looks like the 3 groups get labelled wrong and so

 the comparisons are different (but they can still be matched up).
 My questions are,
 1) do you need to set up your target file differently when using 
 read.AnnotatedDataFrame - what is the standard format?

I can't quite tell where things are going wrong for you, so it would
help if you can narrow down where the problem occurs.  I think
read.AnnotatedDataFrame should be comparable to read.phenoData. Does

 pData(pd)

look right? What about

 pData(Data)

and

 pData(eset.rma)

? It's not important but pData(pd)$Target is the same as pd$Target.
Since the analysis is on eset.rma, it probably makes sense to use the
pData from there to construct your design matrix

 targs-factor

Re: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

2007-08-03 Thread Steven McKinney



I see now that for my example


 foo - data.frame(Filename = c(a, b))
 foo[, FileName]
NULL

the issue is in this clause of the 
[.data.frame extractor.

The lines
if (drop  length(y) == 1L) 
return(.subset2(y, 1L)) 
return the NULL result just before the
error check
cols - names(y)
if (any(is.na(cols))) 
stop(undefined columns selected)
is performed.

Is this intended behaviour, or has a logical
bug crept into the [.data.frame extractor?


if (missing(i)) {
if (missing(j)  drop  length(x) == 1L) 
return(.subset2(x, 1L))
y - if (missing(j)) 
x
else .subset(x, j)
if (drop  length(y) == 1L) 
return(.subset2(y, 1L)) ## This returns a result before undefined 
columns check is done.  Is this intended?
cols - names(y)
if (any(is.na(cols))) 
stop(undefined columns selected)
if (any(duplicated(cols))) 
names(y) - make.unique(cols)
nrow - .row_names_info(x, 2L)
if (drop  !mdrop  nrow == 1L) 
return(structure(y, class = NULL, row.names = NULL))
else return(structure(y, class = oldClass(x), row.names = 
.row_names_info(x, 
0L)))
}




 sessionInfo()
R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   
base 

other attached packages:
 plotrix lme4   Matrix  lattice 
 2.2-3  0.99875-4 0.999375-0 0.16-2 

Should this discussion move to R-devel?

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Steven McKinney
Sent: Fri 8/3/2007 10:37 AM
To: r-help@stat.math.ethz.ch
Subject: [R] FW: Selecting undefined column of a data frame (was 
[BioC]read.phenoData vs read.AnnotatedDataFrame)
 
Hi all,

What are current methods people use in R to identify
mis-spelled column names when selecting columns
from a data frame?

Alice Johnson recently tackled this issue
(see [BioC] posting below).

Due to a mis-spelled column name (FileName
instead of Filename) which produced no warning,
Alice spent a fair amount of time tracking down
this bug.  With my fumbling fingers I'll be tracking
down such a bug soon too.

Is there any options() setting, or debug technique
that will flag data frame column extractions that
reference a non-existent column?  It seems to me
that the [.data.frame extractor used to throw an
error if given a mis-spelled variable name, and I
still see lines of code in [.data.frame such as

if (any(is.na(cols))) 
stop(undefined columns selected)



In R 2.5.1 a NULL is silently returned.

 foo - data.frame(Filename = c(a, b))
 foo[, FileName]
NULL

Has something changed so that the code lines
if (any(is.na(cols))) 
stop(undefined columns selected)
in [.data.frame no longer work properly (if
I am understanding the intention properly)?

If not, could  [.data.frame check an
options() variable setting (say
warn.undefined.colnames) and throw a warning
if a non-existent column name is referenced?




 sessionInfo()
R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   
base 

other attached packages:
 plotrix lme4   Matrix  lattice 
 2.2-3  0.99875-4 0.999375-0 0.16-2 
 



Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Johnstone, Alice
Sent: Wed 8/1/2007 7:20 PM
To: [EMAIL PROTECTED]
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
 
 For interest sake, I have found out why I wasn't getting my expected
results when using read.AnnotatedDataFrame
Turns out the error was made in the ReadAffy command, where I specified
the filenames to be read from my AnnotatedDataFrame object.  There was a
typo error with a capital N ($FileName) rather than lowercase n
($Filename) as in my target file..whoops.  However this meant the
filename argument was ignored without the error message(!) and instead
of using the information in the AnnotatedDataFrame object (which
included filenames, but not alphabetically) it read the .cel files in
alphabetical order from the working directory - hence the wrong file was
given the wrong label (given by the order

Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

2007-08-03 Thread Steven McKinney

Thanks Prof Ripley,

I used double indexing (if I understand the doc correctly)
so my call was

 foo[, FileName]

I traced through each line of `[.data.frame`
following the sequence of commands executed
for my call.

In the code section

if (missing(i)) {
if (missing(j)  drop  length(x) == 1L)
return(.subset2(x, 1L))
y - if (missing(j))
x
else .subset(x, j)
if (drop  length(y) == 1L)
return(.subset2(y, 1L)) ## This returns a result before undefined 
columns check is done.  Is this intended?
cols - names(y)
if (any(is.na(cols)))
stop(undefined columns selected)
if (any(duplicated(cols)))
names(y) - make.unique(cols)
nrow - .row_names_info(x, 2L)
if (drop  !mdrop  nrow == 1L)
return(structure(y, class = NULL, row.names = NULL))
else return(structure(y, class = oldClass(x), row.names = 
.row_names_info(x,
0L)))
}

the return happened after execution of
if (drop  length(y) == 1L)
return(.subset2(y, 1L))
before the check on column names.

Shouldn't the check on column names
cols - names(y)
if (any(is.na(cols)))
stop(undefined columns selected)
occur before
if (drop  length(y) == 1L)
return(.subset2(y, 1L))
rather than after?




-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
Sent: Fri 8/3/2007 12:25 PM
To: Steven McKinney
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] FW: Selecting undefined column of a data frame (was [BioC] 
read.phenoData vs read.AnnotatedDataFrame)
 
You are reading the wrong part of the code for your argument list:

  foo[FileName]
Error in `[.data.frame`(foo, FileName) : undefined columns selected

[.data.frame is one of the most complex functions in R, and does many 
different things depending on which arguments are supplied.


On Fri, 3 Aug 2007, Steven McKinney wrote:

 Hi all,

 What are current methods people use in R to identify
 mis-spelled column names when selecting columns
 from a data frame?

 Alice Johnson recently tackled this issue
 (see [BioC] posting below).

 Due to a mis-spelled column name (FileName
 instead of Filename) which produced no warning,
 Alice spent a fair amount of time tracking down
 this bug.  With my fumbling fingers I'll be tracking
 down such a bug soon too.

 Is there any options() setting, or debug technique
 that will flag data frame column extractions that
 reference a non-existent column?  It seems to me
 that the [.data.frame extractor used to throw an
 error if given a mis-spelled variable name, and I
 still see lines of code in [.data.frame such as

 if (any(is.na(cols)))
stop(undefined columns selected)



 In R 2.5.1 a NULL is silently returned.

 foo - data.frame(Filename = c(a, b))
 foo[, FileName]
 NULL

 Has something changed so that the code lines
 if (any(is.na(cols)))
stop(undefined columns selected)
 in [.data.frame no longer work properly (if
 I am understanding the intention properly)?

 If not, could  [.data.frame check an
 options() variable setting (say
 warn.undefined.colnames) and throw a warning
 if a non-existent column name is referenced?




 sessionInfo()
 R version 2.5.1 (2007-06-27)
 powerpc-apple-darwin8.9.1

 locale:
 en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   
 base

 other attached packages:
 plotrix lme4   Matrix  lattice
 2.2-3  0.99875-4 0.999375-0 0.16-2




 Steven McKinney

 Statistician
 Molecular Oncology and Breast Cancer Program
 British Columbia Cancer Research Centre

 email: smckinney +at+ bccrc +dot+ ca

 tel: 604-675-8000 x7561

 BCCRC
 Molecular Oncology
 675 West 10th Ave, Floor 4
 Vancouver B.C.
 V5Z 1L3
 Canada




 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Johnstone, Alice
 Sent: Wed 8/1/2007 7:20 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame

 For interest sake, I have found out why I wasn't getting my expected
 results when using read.AnnotatedDataFrame
 Turns out the error was made in the ReadAffy command, where I specified
 the filenames to be read from my AnnotatedDataFrame object.  There was a
 typo error with a capital N ($FileName) rather than lowercase n
 ($Filename) as in my target file..whoops.  However this meant the
 filename argument was ignored without the error message(!) and instead
 of using the information in the AnnotatedDataFrame object (which
 included filenames, but not alphabetically) it read the .cel files in
 alphabetical order from the working directory - hence the wrong file was
 given the wrong label (given by the order of Annotated object) and my
 comparisons were confused without being obvious as to why or where.
 Our solution: specify that filename is as.character so assignment

Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

2007-08-03 Thread Steven McKinney

 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 Sent: Fri 8/3/2007 1:05 PM
 To: Steven McKinney
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] FW: Selecting undefined column of a data frame (was [BioC] 
 read.phenoData vs read.AnnotatedDataFrame)

 I've since seen your followup a more detailed explanation may help.
 The path through the code for your argument list does not go where you 
 quoted, and there is a reason for it.

Using a copy of  [.data.frame with browser() I have traced
the flow of execution. (My copy with the browser command is at the end
of this email)

   foo[, FileName]
  Called from: `[.data.frame`(foo, , FileName)
  Browse[1] n
  debug: mdrop - missing(drop)
  Browse[1] n
  debug: Narg - nargs() - (!mdrop)
  Browse[1] n
  debug: if (Narg  3) {
  if (!mdrop) 
  warning(drop argument will be ignored)
  if (missing(i)) 
  return(x)
  if (is.matrix(i)) 
  return(as.matrix(x)[i])
  y - NextMethod([)
  cols - names(y)
  if (!is.null(cols)  any(is.na(cols))) 
  stop(undefined columns selected)
  if (any(duplicated(cols))) 
  names(y) - make.unique(cols)
  return(structure(y, class = oldClass(x), row.names = .row_names_info(x, 
  0L)))
  }
  Browse[1] n
  debug: if (missing(i)) {
  if (missing(j)  drop  length(x) == 1L) 
  return(.subset2(x, 1L))
  y - if (missing(j)) 
  x
  else .subset(x, j)
  if (drop  length(y) == 1L) 
  return(.subset2(y, 1L))
  cols - names(y)
  if (any(is.na(cols))) 
  stop(undefined columns selected)
  if (any(duplicated(cols))) 
  names(y) - make.unique(cols)
  nrow - .row_names_info(x, 2L)
  if (drop  !mdrop  nrow == 1L) 
  return(structure(y, class = NULL, row.names = NULL))
  else return(structure(y, class = oldClass(x), row.names = 
.row_names_info(x, 
  0L)))
  }
  Browse[1] n
  debug: if (missing(j)  drop  length(x) == 1L) return(.subset2(x, 
  1L))
  Browse[1] n
  debug: y - if (missing(j)) x else .subset(x, j)
  Browse[1] n
  debug: if (drop  length(y) == 1L) return(.subset2(y, 1L))
  Browse[1] n
  NULL

So `[.data.frame` is exiting after executing 
+ if (drop  length(y) == 1L) 
+ return(.subset2(y, 1L)) ## This returns a result before undefined 
columns check is done.  Is this intended?

Couldn't the error check
+ cols - names(y)
+ if (any(is.na(cols))) 
+ stop(undefined columns selected)
be done before the above return()?

What would break if the error check on column names was done
before returning a NULL result due to incorrect column name spelling?

Why should

 foo[, FileName]
NULL

differ from

 foo[seq(nrow(foo)), FileName]
Error in `[.data.frame`(foo, seq(nrow(foo)), FileName) : 
undefined columns selected

Thank you for your explanations.

 Generally when you extract in R and ask for an non-existent index you get 
 NA or NULL as the result (and no warning), e.g.

  y - list(x=1, y=2)
  y[[z]]
 NULL

 Because data frames 'must' have (column) names, they are a partial 
 exception and when the result is a data frame you get an error if it would 
 contain undefined columns.

 But in the case of foo[, FileName], the result is a single column and so 
 will not have a name: there seems no reason to be different from

  foo[[FileName]]
 NULL
  foo$FileName
 NULL

 which similarly select a single column.  At one time they were different 
 in R, for no documented reason.

 On Fri, 3 Aug 2007, Prof Brian Ripley wrote:

  You are reading the wrong part of the code for your argument list:

   foo[FileName]
  Error in `[.data.frame`(foo, FileName) : undefined columns selected

  [.data.frame is one of the most complex functions in R, and does many 
  different things depending on which arguments are supplied.

  On Fri, 3 Aug 2007, Steven McKinney wrote:

  Hi all,

  What are current methods people use in R to identify
  mis-spelled column names when selecting columns
  from a data frame?

  Alice Johnson recently tackled this issue
  (see [BioC] posting below).

  Due to a mis-spelled column name (FileName
  instead of Filename) which produced no warning,
  Alice spent a fair amount of time tracking down
  this bug.  With my fumbling fingers I'll be tracking
  down such a bug soon too.

  Is there any options() setting, or debug technique
  that will flag data frame column extractions that
  reference a non-existent column?  It seems to me
  that the [.data.frame extractor used to throw an
  error if given a mis-spelled variable name, and I
  still see lines of code in [.data.frame such as

  if (any(is.na(cols)))
 stop(undefined columns selected)

  In R 2.5.1 a NULL is silently returned.

  foo - data.frame(Filename = c(a, b))
  foo[, FileName]
  NULL

  Has something changed so that the code lines
  if (any(is.na(cols

Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

2007-08-03 Thread Steven McKinney


 What would break is that three methods for doing the same thing would
 give different answers.
 
 Please do have the courtesy to actually read the detailed explanation you
 are given.

Sorry Prof. Ripley, I am attempting to read carefully, as this
issue has deeper coding/debugging implications, and as you
point out, 
  [.data.frame is one of the most complex functions in R
so please bear with me.  This change in behaviour has 
taken away a side-effect debugging tool, discussed below.


 
 
 On Fri, 3 Aug 2007, Steven McKinney wrote:
 
 
  -Original Message-
  From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
  Sent: Fri 8/3/2007 1:05 PM
  To: Steven McKinney
  Cc: r-help@stat.math.ethz.ch
  Subject: Re: [R] FW: Selecting undefined column of a data frame (was 
  [BioC] read.phenoData vs read.AnnotatedDataFrame)
 
  I've since seen your followup a more detailed explanation may help.
  The path through the code for your argument list does not go where you
  quoted, and there is a reason for it.
 
 
  Generally when you extract in R and ask for an non-existent index you get
  NA or NULL as the result (and no warning), e.g.
 
  y - list(x=1, y=2)
  y[[z]]
  NULL
 
  Because data frames 'must' have (column) names, they are a partial
  exception and when the result is a data frame you get an error if it would
  contain undefined columns.
 
  But in the case of foo[, FileName], the result is a single column and so
  will not have a name: there seems no reason to be different from
 
  foo[[FileName]]
  NULL
  foo$FileName
  NULL
 
  which similarly select a single column.  At one time they were different
  in R, for no documented reason.


This difference provided a side-effect debugging tool, in that where

   bar - foo[, FileName]

used to throw an error, alerting as to a typo, it now does not.

Having been burned by NULL results due to typos in code lines using
the $ extractor such as
 
   bar - foo$FileName

I learned to use
   bar - foo[, FileName]
to help cut down on typo bugs.  With the ubiquity of
camelCase object names, this is a constant typing bug hazard.


I am wondering what to do now to double check spelling
when accessing columns of a dataframe.

If [.data.frame stays as is, can a debug mechanism
be implemented in R that forces strict adherence
to existing list names in debug mode?  This would also help debug
typos in camelCase names when using the $ and [[
extractors and accessors.

Are there other debugging tools already in R that
can help point out such camelCase list element
name typos?



 
 
  On Fri, 3 Aug 2007, Prof Brian Ripley wrote:
 
  You are reading the wrong part of the code for your argument list:
 
   foo[FileName]
  Error in `[.data.frame`(foo, FileName) : undefined columns selected
 
  [.data.frame is one of the most complex functions in R, and does many
  different things depending on which arguments are supplied.
 
 
  On Fri, 3 Aug 2007, Steven McKinney wrote:
 
  Hi all,
 
  What are current methods people use in R to identify
  mis-spelled column names when selecting columns
  from a data frame?
 
  Alice Johnson recently tackled this issue
  (see [BioC] posting below).
 
  Due to a mis-spelled column name (FileName
  instead of Filename) which produced no warning,
  Alice spent a fair amount of time tracking down
  this bug.  With my fumbling fingers I'll be tracking
  down such a bug soon too.
 
  Is there any options() setting, or debug technique
  that will flag data frame column extractions that
  reference a non-existent column?  It seems to me
  that the [.data.frame extractor used to throw an
  error if given a mis-spelled variable name, and I
  still see lines of code in [.data.frame such as
 
  if (any(is.na(cols)))
 stop(undefined columns selected)
 
 
 
  In R 2.5.1 a NULL is silently returned.
 
  foo - data.frame(Filename = c(a, b))
  foo[, FileName]
  NULL
 
  Has something changed so that the code lines
  if (any(is.na(cols)))
 stop(undefined columns selected)
  in [.data.frame no longer work properly (if
  I am understanding the intention properly)?
 
  If not, could  [.data.frame check an
  options() variable setting (say
  warn.undefined.colnames) and throw a warning
  if a non-existent column name is referenced?
 
 
 
 
  sessionInfo()
  R version 2.5.1 (2007-06-27)
  powerpc-apple-darwin8.9.1
 
  locale:
  en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods
  base
 
  other attached packages:
  plotrix lme4   Matrix  lattice
  2.2-3  0.99875-4 0.999375-0 0.16-2
 
 
 
 
  Steven McKinney
 
  Statistician
  Molecular Oncology and Breast Cancer Program
  British Columbia Cancer Research Centre
 
  email: smckinney +at+ bccrc +dot+ ca
 
  tel: 604-675-8000 x7561
 
  BCCRC
  Molecular Oncology
  675 West 10th Ave, Floor 4
  Vancouver B.C.
  V5Z 1L3
  Canada
 
 
 
 
  --
  Brian D. Ripley

Re: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

2007-08-03 Thread Steven McKinney

Hi Bert,

 -Original Message-
 From: Bert Gunter [mailto:[EMAIL PROTECTED]
 Sent: Fri 8/3/2007 3:19 PM
 To: Steven McKinney; r-help@stat.math.ethz.ch
 Subject: RE: [R] FW: Selecting undefined column of a data frame (was 
 [BioC]read.phenoData vs read.AnnotatedDataFrame)

 I suspect you'll get some creative answers, but if all you're worried about
 is whether a column exists before you do something with it, what's wrong
 with:

 nm - ... ## a character vector of names
 if(!all(nm %in% names(yourdata))) ## complain
 else ## do something

 I think this is called defensive programming.

This is a good example of good defensive programming.
I do indeed check variable/object names whenever
obtaining them from an external source (user input,
file input, a list in code).

I was able to practice a defensive programming style in the past
by using
  bar - foo[, FileName]
instead of
  bar - foo$FileName

but this has changed recently, so I need to figure out
some other mechanisms.

R is such a productive language, but this change will
lead many of us to chase elusive typos that used to
get revealed.

I'm hoping that some kind of explicit data frame variable
checking mechanism might be introduced since we've
lost this one.  

It would also be great to have such a
mechanism to help catch list access and extraction
errors.  Why should
foo$FileName
always quietly return NULL?

I'm not sure why the following incongruity is okay.

 foo - matrix(1:4, nrow = 2)
 dimnames(foo) - list(NULL, c(a, b))
 bar - foo[, A]
Error: subscript out of bounds

 foo.df - as.data.frame(foo)
 foo.df
  a b
1 1 3
2 2 4
 bar - foo.df[, A]
 bar
NULL

It is a lot of extra typing to wrap every command in
extra code, but more of that will need to happen
going forward.

Steve McKinney

 Bert Gunter
 Genentech

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Steven McKinney
 Sent: Friday, August 03, 2007 10:38 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] FW: Selecting undefined column of a data frame (was
 [BioC]read.phenoData vs read.AnnotatedDataFrame)

 Hi all,

 What are current methods people use in R to identify
 mis-spelled column names when selecting columns
 from a data frame?

 Alice Johnson recently tackled this issue
 (see [BioC] posting below).

 Due to a mis-spelled column name (FileName
 instead of Filename) which produced no warning,
 Alice spent a fair amount of time tracking down
 this bug.  With my fumbling fingers I'll be tracking
 down such a bug soon too.

 Is there any options() setting, or debug technique
 that will flag data frame column extractions that
 reference a non-existent column?  It seems to me
 that the [.data.frame extractor used to throw an
 error if given a mis-spelled variable name, and I
 still see lines of code in [.data.frame such as

 if (any(is.na(cols))) 
 stop(undefined columns selected)

 In R 2.5.1 a NULL is silently returned.

  foo - data.frame(Filename = c(a, b))
  foo[, FileName]
 NULL

 Has something changed so that the code lines
 if (any(is.na(cols))) 
 stop(undefined columns selected)
 in [.data.frame no longer work properly (if
 I am understanding the intention properly)?

 If not, could  [.data.frame check an
 options() variable setting (say
 warn.undefined.colnames) and throw a warning
 if a non-existent column name is referenced?

  sessionInfo()
 R version 2.5.1 (2007-06-27) 
 powerpc-apple-darwin8.9.1 

 locale:
 en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base 

 other attached packages:
  plotrix lme4   Matrix  lattice 
  2.2-3  0.99875-4 0.999375-0 0.16-2 

 Steven McKinney

 Statistician
 Molecular Oncology and Breast Cancer Program
 British Columbia Cancer Research Centre

 email: smckinney +at+ bccrc +dot+ ca

 tel: 604-675-8000 x7561

 BCCRC
 Molecular Oncology
 675 West 10th Ave, Floor 4
 Vancouver B.C. 
 V5Z 1L3
 Canada

 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Johnstone, Alice
 Sent: Wed 8/1/2007 7:20 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame

  For interest sake, I have found out why I wasn't getting my expected
 results when using read.AnnotatedDataFrame
 Turns out the error was made in the ReadAffy command, where I specified
 the filenames to be read from my AnnotatedDataFrame object.  There was a
 typo error with a capital N ($FileName) rather than lowercase n
 ($Filename) as in my target file..whoops.  However this meant the
 filename argument was ignored without the error message(!) and instead
 of using the information in the AnnotatedDataFrame object (which
 included filenames, but not alphabetically) it read the .cel files in
 alphabetical order from the working directory - hence the wrong file was
 given the wrong label

Re: [R] Extracting a website text content using R

2007-08-01 Thread Steven McKinney

-Original Message-
From: [EMAIL PROTECTED] on behalf of Am Stat
Sent: Wed 8/1/2007 2:19 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Extracting a website text content using R

Dear useR,

Just wandering whether it is possible that there is any function in R could
let me get the text contents for a certain website.

Thanks a lot!

Best,

Leon

Is this what you had in mind?

 foo - scan(url(http://cran.r-project.org/;), what = character)
Read 69 items
 paste(unlist(foo), collapse =  )
[1] !DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN  html head titleThe 
Comprehensive R Archive Network/title link rel=\icon\ href=\favicon.ico\ 
type=\image/x-icon\ link rel=\shortcut icon\ href=\favicon.ico\ 
type=\image/x-icon\ link rel=\stylesheet\ type=\text/css\ 
href=\R.css\ /head FRAMESET cols=\1*, 4*\ border=0 FRAMESET 
rows=\120, 1*\ FRAME src=\logo.html\ name=\logo\ frameborder=0 FRAME 
src=\navbar.html\ name=\contents\ frameborder=0 /FRAMESET FRAME 
src=\banner.shtml\ name=\banner\ frameborder=0 noframes h1The 
Comprehensive R Archive Network/h1 Your browser seems not to support frames, 
here is the A href=\navbar.html\contents page/A of CRAN. /noframes 
/FRAMESET

Try the search phrase

cran scan url

in Google for more hits on
info about R functions that
can deal with URLs.

In R try

 apropos(URL)
 [1] contourLines   URLdecode  URLencode  browseURL  
contrib.urlmain.help.url  url.show  
 [8] loadURLread.table.url scan.url   source.url url  

SteveM

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Design matrix question

2007-05-16 Thread Steven McKinney


Hi useRs,

Perhaps I am having a senior moment?

I have a nested variable situation to model,

toy example:


 df - data.frame(A = factor(c(a, a, x, x), levels = c(x, a)),
+  B = factor(c(b, x, x, x), levels = c(x, b)))
 
 df
  A B
1 a b
2 a x
3 x x
4 x x

So of course the full design matrix is singular

 model.matrix(~ A * B, df)
  (Intercept) Aa Bb Aa:Bb
1   1  1  1 1
2   1  1  0 0
3   1  0  0 0
4   1  0  0 0
attr(,assign)
[1] 0 1 2 3
attr(,contrasts)
attr(,contrasts)$A
[1] contr.treatment

attr(,contrasts)$B
[1] contr.treatment


I'd like not to have the B term main effect in the model

 model.matrix(~ A * B - B, df)
  (Intercept) Aa Ax:Bb Aa:Bb
1   1  1 0 1
2   1  1 0 0
3   1  0 0 0
4   1  0 0 0
attr(,assign)
[1] 0 1 2 2
attr(,contrasts)
attr(,contrasts)$A
[1] contr.treatment

attr(,contrasts)$B
[1] contr.treatment

 

How do I get model.matrix to not add that
column of zeroes?

Why does model.matrix add that column of zeroes?

Is this a bug, or a senior moment?





Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reodering factor

2007-05-03 Thread Steven McKinney


One way to reorder a factor is to define a new
factor and specify the order of levels using
the levels argument of the factor() function.

The first category specified for the levels
argument will be the reference category in
model fits such as with lm().


 mydata - data.frame(y = c(runif(10), runif(10) + 10), grp = c(rep(A, 10), 
 rep(B, 10)))
 mydata
y grp
1   0.0808684   A
2   0.2930649   A
3   0.4671063   A
4   0.7815386   A
5   0.5360262   A
6   0.8092338   A
7   0.9965648   A
8   0.3549031   A
9   0.3426956   A
10  0.2988377   A
11 10.6528479   B
12 10.7118101   B
13 10.4484731   B
14 10.9638309   B
15 10.7650812   B
16 10.6355089   B
17 10.7003755   B
18 10.2147930   B
19 10.8901356   B
20 10.6319798   B
 lm(y ~ grp, data = mydata)

Call:
lm(formula = y ~ grp, data = mydata)

Coefficients:
(Intercept) grpB  
 0.4961  10.1654  

 mydata$grp2 - factor(mydata$grp, levels = c(B, A))
 mydata
y grp grp2
1   0.0808684   AA
2   0.2930649   AA
3   0.4671063   AA
4   0.7815386   AA
5   0.5360262   AA
6   0.8092338   AA
7   0.9965648   AA
8   0.3549031   AA
9   0.3426956   AA
10  0.2988377   AA
11 10.6528479   BB
12 10.7118101   BB
13 10.4484731   BB
14 10.9638309   BB
15 10.7650812   BB
16 10.6355089   BB
17 10.7003755   BB
18 10.2147930   BB
19 10.8901356   BB
20 10.6319798   BB
 lm(y ~ grp2, data = mydata)

Call:
lm(formula = y ~ grp2, data = mydata)

Coefficients:
(Intercept)grp2A  
  10.66   -10.17  

 



Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of John Sorkin
Sent: Thu 5/3/2007 7:10 PM
To: r-help@stat.math.ethz.ch
Subject: [R] reodering factor
 
R 2.4.1 
Windows XP

How does one reorder a factor?


I have the following data:
 factor(data$Group)
 [1] ZZ ZT ZT ZZ ZZ ZT ZZ ZZ ZT ZT ZT ZT ZZ ZT ZT ZZ ZT ZZ ZT ZZ ZT ZT ZZ ZZ ZT 
ZZ ZT ZZ ZT ZZ ZZ ZT ZZ ZT
Levels: ZT ZZ

In my regression (i.e. lm(y~data$Group) ZT is taken as the reference category 
and I get an estimate for ZZ. I would like ZZ to be the reference category and 
obtain an estimate for ZT.

Thank,
John

John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for the so...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Computing an ordering on subsets of a data frame

2007-04-19 Thread Steven McKinney

Hi Lukas,

Using by() or its cousins tapply() etc. is tricky,
as you need to properly merge results back into X.

You can do that by adding a key ID variable to X, 
and carrying along that key ID variable in calls
to by() etc., though I haven't tested out a method.

You can also create a new column in X to hold the
results, and then sort the subsections of X in a
for() loop.

 X - data.frame(A = c(1,1,1,2,2,2,3,3,3), B = c(2,3,4,3,1,1,2,1,3))
 X
  A B
1 1 2
2 1 3
3 1 4
4 2 3
5 2 1
6 2 1
7 3 2
8 3 1
9 3 3
 
 X$C - rep(as.numeric(NA), nrow(X))
 
 sortLevels - unique(X$A)
 
 for(i in seq(along = sortLevels)) {
+   sortIdxp - X$A == sortLevels[i]
+   X$C[sortIdxp] - rank(X$B[sortIdxp], ties.method = random)
+ }
 X
  A B C
1 1 2 1
2 1 3 2
3 1 4 3
4 2 3 3
5 2 1 1
6 2 1 2
7 3 2 2
8 3 1 1
9 3 3 3
 

Merging results back in after using
tapply() or by() is harder if your
data frame is in random order, but the
for() loop approach with indexing
still works fine.

 set.seed(123)
 Y - X[sample(9), ]
 Y
  A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
 Y$C - rep(as.numeric(NA), nrow(Y))
 
 sortLevels - unique(Y$A)
## You can also use levels() instead of unique() if Y$A is a factor.
 
 for(i in seq(along = sortLevels)) {
+   sortIdxp - Y$A == sortLevels[i]
+   Y$C[sortIdxp] - rank(Y$B[sortIdxp], ties.method = random)
+ }
 Y
  A B C
3 1 4 3
7 3 2 2
9 3 3 3
6 2 1 2
5 2 1 1
1 1 2 1
2 1 3 2
8 3 1 1
4 2 3 3
 oY - order(Y$A)
 Y[oY,]
  A B C
3 1 4 3
1 1 2 1
2 1 3 2
6 2 1 2
5 2 1 1
4 2 3 3
7 3 2 2
9 3 3 3
8 3 1 1


 
HTH
 

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]
tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3

Canada


 

 


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:r-help-
 [EMAIL PROTECTED] On Behalf Of Lukas Biewald
 Sent: Wednesday, April 18, 2007 2:49 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Computing an ordering on subsets of a data frame
 
 If I have a data frame X that looks like this:
 
 A B
 - -
 1 2
 1 3
 1 4
 2 3
 2 1
 2 1
 3 2
 3 1
 3 3
 
 and I want to make another column which has the rank of B computed
 separately for each value of A.
 
 I.e. something like:
 
 A B C
 - - -
 1 2 1
 1 3 2
 1 4 3
 2 3 3
 2 1 1
 2 1 2
 3 2 2
 3 1 1
 3 3 3
 
 by(X, X[,1], function(x) { rank(x[,1], ties.method=random) } )
almost
 seems to work, but the data is not in a frame, and I can't figure out
how
 to
 merge it back into X properly.
 
 Thanks,
 Lukas
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting intercept from ppr fit

2007-04-18 Thread Steven McKinney


Hi Vadim,

Estimates of the alpha_0 terms in MASS are the $yb component
of the object returned by ppr().  As I understand it,
the original PPR algorithm assumes the response
variable(s) are centered, so the 'alpha_0' term in
MASS is just the mean of the response if the user
does not center the responses.  Your actual 'a' term
will not appear in the output of ppr.

 n - 1000
 data - data.frame(x= rnorm (n), y= rnorm (n))
 a - 10
 data$z - evalq(a + atan (x + y) + rnorm (n), data)
 data.ppr - ppr(z ~ x + y, data=data, nterms =1)
 ## how to extract a = 10 from data.ppr?
 data.ppr$yb
[1] 9.973964
 a - 210
 data$z - evalq(a + atan (x + y) + rnorm (n), data)
 data.ppr - ppr(z ~ x + y, data=data, nterms =1)
 data.ppr$yb
[1] 209.9773
 


HTH

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Vadim Ogranovich
Sent: Tue 4/17/2007 1:06 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] extracting intercept from ppr fit
 
Sorry for triple-posting : I seem to have a problem w/ my mail client. 

Hi, 

Is there a way, documented or not, to extract the intercept term (the alpha_0 
the MASS book) from a ppr() (Projection Persuit Regression) fit? 

Thanks, 
Vadim 

## Example: 
n - 1000 

data - data.frame(x= rnorm (n), y= rnorm (n)) 

a - 10 
data$z - evalq(a + atan (x + y) + rnorm (n), data) 

data.ppr - ppr(z ~ x + y, data=data, nterms =1) 

## how to extract a = 10 from data.ppr? 
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting approximate Wald test (Chisq) fromcoxph(..frailty)

2007-04-17 Thread Steven McKinney

$assign2
print2 - NULL
for (i in 1:nterms) {
kk - alist[[i]]
if (print.map[i]  0) {
j - print.map[i]
if (pterms[i] == 2) 
temp - (object$printfun[[j]])(object$frail, 
  object$fvar, , object$df[i], object$history[[j]])
else temp - (object$printfun[[j]])(coef[kk], object$var[kk, 
kk], object$var2[kk, kk], object$df[i], object$history[[j]])
print1 - rbind(print1, temp$coef)
if (is.matrix(temp$coef)) {
xx - dimnames(temp$coef)[[1]]
if (is.null(xx)) 
  xx - rep(names(pterms)[i], nrow(temp$coef))
else xx - paste(names(pterms)[i], xx, sep = , )
pname1 - c(pname1, xx)
}
else pname1 - c(pname1, names(pterms)[i])
print2 - c(print2, temp$history)
}
else if (terms  length(kk)  1) {
pname1 - c(pname1, names(pterms)[i])
temp - coxph.wtest(object$var[kk, kk], coef[kk])$test
print1 - rbind(print1, c(NA, NA, NA, temp, object$df[i], 
1 - pchisq(temp, 1)))
}
else {
pname1 - c(pname1, names(coef)[kk])
tempe - (diag(object$var))[kk]
temp - coef[kk]^2/tempe
print1 - rbind(print1, cbind(coef[kk], sqrt(tempe), 
sqrt((diag(object$var2))[kk]), temp, 1, 1 - pchisq(temp, 
  1)))
}
}
temp - cbind(format(print1[, 1]), format(print1[, 2]), format(print1[, 
3]), format(round(print1[, 4], 2)), format(round(print1[, 
5], 2)), format(signif(print1[, 6], 2)))
temp - ifelse(is.na(print1), , temp)
dimnames(temp) - list(substring(pname1, 1, maxlabel), c(coef, 
se(coef), se2, Chisq, DF, p))
prmatrix(temp, quote = FALSE)
if (conf.int  length(coef)  0) {
z - qnorm((1 + conf.int)/2, 0, 1)
coef - coef * scale
se - se * scale
tmp - cbind(exp(coef), exp(-coef), exp(coef - z * se), 
exp(coef + z * se))
dimnames(tmp) - list(substring(names(coef), 1, maxlabel), 
c(exp(coef), exp(-coef), paste(lower ., round(100 * 
conf.int, 2), sep = ), paste(upper ., round(100 * 
conf.int, 2), sep = )))
cat(\n)
prmatrix(tmp)
}
logtest - -2 * (object$loglik[1] - object$loglik[2])
sctest - object$score
cat(\nIterations:, object$iter[1], outer,, object$iter[2], 
Newton-Raphson\n)
if (length(print2)) {
for (i in 1:length(print2)) cat(, print2[i], \n)
}
if (is.null(object$df)) 
df - sum(!is.na(coef))
else df - round(sum(object$df), 2)
cat(Degrees of freedom for terms=, format(round(object$df, 
1)), \n)
cat(Rsquare=, format(round(1 - exp(-logtest/object$n), 
3)),   (max possible=, format(round(1 - exp(2 * 
object$loglik[1]/object$n), 
3)), )\n)
cat(Likelihood ratio test= , format(round(logtest, 2)), 
  on , df,  df,,p=, format(1 - pchisq(logtest, 
df)), \n, sep = )
if (!is.null(object$wald.test)) 
cat(Wald test= , format(round(object$wald.test, 
2)),   on , df,  df,   p=, format(1 - pchisq(object$wald.test, 
df)), sep = )
if (!is.null(object$score)) 
cat(\nScore (logrank) test = , format(round(sctest, 
2)),   on , df,  df,,p=, format(1 - pchisq(sctest, 
df)), sep = )
if (is.null(object$rscore)) 
cat(\n)
else cat(,   Robust = , format(round(object$rscore, 2)), 
  p=, format(1 - pchisq(object$rscore, df)), \n, 
sep = )
invisible(return(list(temp = temp, tmp = tmp)))
}




Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Mohammad Ehsanul Karim
Sent: Tue 4/17/2007 2:55 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Extracting approximate Wald test (Chisq) fromcoxph(..frailty)
 
Dear list,

I need to extract the approximate Wald test (Chisq) so
that I can put it in a loop. str seemed like a great
idea, but I cannot seem to find the approximate Wald
test for frailty (in the example data below: 17.89 and
its p-value 0.12000) there. I cannot seem to find it
in capture.output either as numeric form. Do I need to
modify some given values? If yes, please give me a
clue for the example:

library(survival)
kfitm1-coxph(formula = Surv(time, status) ~ age +
sex +disease + frailty(id, dist = gauss), 
data = kidney)
str(kfitm1)
capture.output( print(kfitm1) )


Mohammad Ehsanul Karim (R - 2.3.1 on windows)
wildscop at yahoo dot com
Institute of Statistical Research and Training
University of Dhaka

Re: [R] problem with RCurl install on Unix

2007-03-21 Thread Steven McKinney

I get the same problem, and haven't
figured it out yet.

Is it a 32bit/64bit clash?

(Similarly, I don't have RMySQL up and running cleanly.)




 library(RCurl)
Error in dyn.load(x, as.logical(local), as.logical(now)) : 
unable to load shared library 
'/share/apps/R/R-2.4.1/library/RCurl/libs/RCurl.so':
  libcurl.so.4: cannot open shared object file: No such file or directory
Error: package/namespace load failed for 'RCurl'


 sessionInfo()
R version 2.4.1 (2006-12-18) 
x86_64-unknown-linux-gnu 

locale:
LC_CTYPE=en_US.iso885915;LC_NUMERIC=C;LC_TIME=en_US.iso885915;LC_COLLATE=en_US.iso885915;LC_MONETARY=en_US.iso885915;LC_MESSAGES=en_US.iso885915;LC_PAPER=en_US.iso885915;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.iso885915;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods  
[7] base 

other attached packages:
 DBI 
0.1-12 
 

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Mark W Kimpel
Sent: Wed 3/21/2007 3:09 PM
To: r-help@stat.math.ethz.ch
Subject: [R] problem with RCurl install on Unix
 
I am having trouble getting an install of RCurl to work properly on a 
Unix server. The steps I have taken are:

1. installed cUrl from source without difficulty
2. installed RCurl from source using the command
~/R_HOME/R-devel/bin/R CMD INSTALL -l ~/R_HOME/R-devel/library 
~/RCurl_0.8-0.tar.gz I received no errors during this install
3. when I go back to R and require(RCurl), I get

  require(RCurl)
Loading required package: RCurl
Error in dyn.load(x, as.logical(local), as.logical(now)) :
 unable to load shared library 
'/N/u/mkimpel/BigRed/R_HOME/R-devel/library/RCurl/libs/RCurl.so':
   libcurl.so.4: cannot open shared object file: No such file or directory
°1§ FALSE

Outside of R I get

mkimpelàBigRed:¨/R_HOME/R-devel/library/RCurl/libs ls
RCurl.so

I can cat into RCurl and I have even done chmod a+x RCurl.so in case 
there was a problem with permission for R to open the file.

Below is my sessionInfo. Thanks, Mark

  sessionInfo()
R version 2.5.0 Under development (unstable) (2007-03-11 r40824)
powerpc64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
°1§ stats graphics  grDevices datasets  utils tools
°7§ methods   base

other attached packages:
 limma  affyaffyio   Biobase
  2.9.13 1.13.14   1.3.3 1.13.39
 



-- 
Mark W. Kimpel MD
Neuroinformatics
Department of Psychiatry
Indiana University School of Medicine

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-14 Thread Steven McKinney

Since you can index a matrix or dataframe with
a matrix of logicals, you can use is.na()
to index all the NA locations and replace them
all with 0 in one command.

 mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, 
 replace = TRUE), nrow = 6))
 mydata.df
  V1 V2 V3 V4 V5
1  1 NA  1  1  1
2  1 NA NA NA  1
3 NA NA  1 NA NA
4 NA NA NA NA  1
5 NA  1 NA NA  1
6  1 NA NA  1  1
 is.na(mydata.df)
 V1V2V3V4V5
1 FALSE  TRUE FALSE FALSE FALSE
2 FALSE  TRUE  TRUE  TRUE FALSE
3  TRUE  TRUE FALSE  TRUE  TRUE
4  TRUE  TRUE  TRUE  TRUE FALSE
5  TRUE FALSE  TRUE  TRUE FALSE
6 FALSE  TRUE  TRUE FALSE FALSE
 mydata.df[is.na(mydata.df)] - 0
 mydata.df
  V1 V2 V3 V4 V5
1  1  0  1  1  1
2  1  0  0  0  1
3  0  0  1  0  0
4  0  0  0  0  1
5  0  1  0  0  1
6  1  0  0  1  1
 

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of David L. Van Brunt, Ph.D.
Sent: Wed 3/14/2007 5:22 PM
To: R-Help List
Subject: [R] replacing all NA's in a dataframe with zeros...
 
I've seen how to  replace the NA's in a single column with a data frame

* mydata$ncigs[is.na(mydata$ncigs)]-0

*But this is just one column... I have thousands of columns (!) that I need
to do this, and I can't figure out a way, outside of the dreaded loop, do
replace all NA's in an entire data frame (all vars) without naming each var
separately. Yikes.

I'm racking my brain on this, seems like I must be staring at the obvious,
but it eludes me. Searches have come up CLOSE, but not quite what I need..

Any pointers?

-- 
---
David L. Van Brunt, Ph.D.
mailto:[EMAIL PROTECTED]

If Tyranny and Oppression come to this land, it will be in the guise of
fighting a foreign enemy.
--James Madison

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dendrogram - got it , just need to label :)

2007-03-09 Thread Steven McKinney


Here is one example of labeling nodes,
borrowing code from the help page for
the dendrapply() function.

local({
  edgeLab - function(n) {
  if(!is.leaf(n)) {
a - attributes(n)
i - i+1
attr(n, edgetext) -
format(i)
  }
  n
  }
  i - 0
 })
dL - dendrapply(as.dendrogram(hclust(dist(iris[, 1:4]), method = single)), 
edgeLab)
plot(dL)


This labels the edges above the nodes.

Martin Maechler and Robert Gentleman are developing the
dendrogram objects suite of functions.
As I have had to label nodes in S-PLUS, I'd like to put
in a request for a few more control parameters for
edge/internal node labeling control:

 - Allow the label without the polygon surrounding it.
   The polygon can obliterate too much of the dendrogram
   for larger sample sizes.  Perhaps an edgePar polygon
   plot logical p.plot taking values TRUE (default) and
   FALSE to omit the polygon.
 - Allow the label to appear near the node at the base
   of the edge.  Perhaps an edgePar text location parameter
   t.pos taking values in (0.0, 1.0) where 0.5 is in the
   middle of the edge (the default) and 1.0 is at the base 
   of the edge.

Since clusters are not always identified by 'cutting'
the dendrogram (e.g. in the iris single linkage
dendrogram plot we want to identify internal nodes that
are large runts or whose vertical edge lengths are
considerably longer than average) it is useful to be able
to identify nodes deeper in the tree.  This is aided
by having access to internal node labels/names and being
able to extract internal nodes by those labels/names.


Best

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of bunny , lautloscrew.com
Sent: Fri 3/9/2007 2:02 PM
To: R-help@stat.math.ethz.ch
Subject: [R] dendrogram - got it , just need to label :)
 
Hi all, Hi Gavin,

thx for your help i finally found out what i want to do and how to  
fix it.
just needed to get some more level my cut level was too small...

two question remain...

a) can i somehow scale the twigs after cutting ?
b) how can i label the nodes and how to label which one...

thx !!

-m.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Double-banger function names: preferences and suggestions

2007-02-26 Thread Steven McKinney


The underscore versus left-arrow 
conundrum has its roots in the evolution
of ASCII during the middle of the last
century. 

Some Teletype machines in the 1970s (when the
S language was being developed) still had a
left arrow, and its ASCII code was used in S
as a one keystroke convenience for the assignment
operator.  The left arrow symbol was then
removed from most keyboards/printers/fontsets
and replaced by the underscore.  Thus the
underscore remained as a one keystroke assignment
operator.


See e.g.

http://www.wps.com/projects/codes/index.html#GRPH


LEFT-ARROW, ?
UNDERSCORE, _

One of the graphical codes, left-arrow mutated to the 
underscore of ASCII-1967. It may have had earlier, 
or other, meanings, but for some early programming 
languages it was assignment, eg.

  c ? b + a

C is assigned the sum of B and A.




Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Marc Schwartz
Sent: Sun 2/25/2007 8:28 AM
To: Alberto Vieira Ferreira Monteiro
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Double-banger function names: preferences and suggestions
 
On Sun, 2007-02-25 at 15:56 +, Alberto Vieira Ferreira Monteiro
wrote:
 hadley wickham wrote:
 
  What do you prefer/recommend for double-banger function names:
 
   1 scale.colour
   2 scale_colour
   3 scaleColour
 
  1 is more R-like, but conflicts with S3.  2 is a modern version of
  number 1, but not many packages use it.  Number 3 is more java-like.
  (I like number 2 best)
 
  Any suggestions?
 
 I always prefer 2, but this would make it non-portable to S-Plus. S-Plus
 has a bug, where _ is the equivalent to - (why would they do this? I
 prefer to think it's stupidity and not villainy)

That's not a bug. If you search the archives of both the S-PLUS list and
the R lists, you will see highly energized discussion on the use of the
underscore operator.

In R, the use of '_' was allowed for assignment up until version 1.8.0
when:

DEPRECATED  DEFUNCT

o   The assignment operator `_' has been removed.


and subsequently allowed in names in version 1.9.0 when:

 o  Underscore '_' is now allowed in syntactically valid names, and
make.names() no longer changes underscores.  Very old code
that makes use of underscore for assignment may now give
confusing error messages.


Not to further contribute to the dialog on 'style', but to further
contribute ;-), for those who have coded in the Windows environment (ie.
C, VBA, etc.) the extension of sorts to number 3 is of course Hungarian
Notation, named after Charles Simonyi, originally at Xerox PARC and
later senior developer/architect at MS. The extension was the inclusion
of the data type prefix, such as fnScaleColour to indicate that this was
a function, with the name using caps to make words more distinct.

And no, I'm not advocating that use...I have been guilty myself of using
variants of 1 and 3, perhaps driven by my circulating caffeine levels as
much as anything else.

HTH,

Marc Schwartz
Off to go remove 12 inches of snow from the driveway and sidewalk...oy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Google Custom Search Engine for R

2007-02-23 Thread Steven McKinney


Whereas R is very generic,
CRAN is much less so.

I've had very good luck adding CRAN
to my search terms, e.g. try to Google

cran 3d scatterplot

This produces all R-related hits on
the first Google page.


Hope this helps


Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Matthew Keller
Sent: Fri 2/23/2007 7:51 AM
To: Sérgio Nunes
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Google Custom Search Engine for R
 
Hi Sergio,

There was a discussion on this board recently about the difficulty of
searching for R related material on the web. I think the custom
google search engine is a good idea. It would be helpful if we could
have access to the full list of websites it is indexing so that we
could make suggestions about other sites that are missing. As it is,
it only tells us that there are 35 websites, and shows us the first
several.

Also, you might check out Sasha Goodman's Rseek: http://www.rseek.org/

Have you tried to compare the success of yours with Rseek?

All the Best,

Matt

On 2/23/07, Sérgio Nunes [EMAIL PROTECTED] wrote:
 Hi,

 Since R is a (very) generic name, I've been having some trouble
 searching the web for this topic. Due to this, I've just created a
 Google Custom Search Engine that includes several of the most relevant
 sites that have information on R.

 See it in action at:

 http://google.com/coop/cse?cx=018133866098353049407%3Aozv9awtetwy

 This is really a preliminary test. Feel free to add yourself to the
 project and contribute with suggestions.

 Sérgio Nunes

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Matthew C Keller
Postdoctoral Fellow
Virginia Institute for Psychiatric and Behavioral Genetics

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to plot two graphs on one single plot?

2007-02-23 Thread Steven McKinney


Broadly speaking, there are a few classes 
of plotting functions.

1)  New graph functions.
The plot() function starts a new graph.

2)  Add to a graph functions
The points(), lines(), text() etc. functions
add something to the current graph.

3)  Control graph functions
par() controls various aspects of the graph.

R graphics experts might have some better
classification and terminology.

So you want your second plotting function to be
points() rather than plot(), to add to the
existing graph.  

Try

  x1=rnorm(25, mean=0, sd=1)
  y1=dnorm(x1, mean=0, sd=1)
 
  x2=rnorm(25, mean=0, sd=1)
  y2=dnorm(x2, mean=0, sd=1)
  plot(x1, y1, type='p', xlim=range(x1,x2), ylim=range(y1, y2), xlab='x', 
 ylab='y')
  points(x2, y2, type='p', col=red, xlab='x', ylab='y')
 

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Yun Zhang
Sent: Fri 2/23/2007 7:34 AM
To: Henrique Dallazuanna
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] How to plot two graphs on one single plot?
 
Thanks. Now R plots two graphs on one plot.
Yet they are still on two graphs, vertically parallelized with each other.

But what I want to do is actually plotting two distribution on one 
single graph, using the same x and y axis. Like:
|
|
|   (dist2)
|   (dist 1)
|
---

Is it possible to do that?

Thanks,
Yun

Henrique Dallazuanna wrote:
 par(mfrow=c(2,1))
 #your plot
 #after plot
 par(mfrow=c(1,1))

 On 23/02/07, *Yun Zhang* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:

 Hi,

 I am trying to plot two distribution graph on one plot. But I dont
 know
 how. I set them to the same x, y limit, even same x, y labels.

 Code:
 x1=rnorm(25, mean=0, sd=1)
 y1=dnorm(x1, mean=0, sd=1)

 x2=rnorm(25, mean=0, sd=1)
 y2=dnorm(x2, mean=0, sd=1)
 plot(x1, y1, type='p', xlim=range(x1,x2), ylim=range(y1, y2),
 xlab='x',
 ylab='y')
 plot(x2, y2, type='p', col=red, xlab='x', ylab='y')

 They just dont show up in one plot.

 Any hint will be very helpful.

 Thanks,
 Yun

 __
 R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 -- 
 Henrique Dallazuanna
 Curitiba-Paraná
 Brasil

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bug/problem reporting: Possible to modify posting guide FAQ?

2006-08-28 Thread Steven McKinney


If users post a bug or problem issue to an R-based news group
(R-devel, R-help, BioC - though BioC is far more forgiving)
they get yelled at for not reading the posting guide
and FAQ.

Please *_do_* read the FAQ, the posting guide, ...
the yellers do say.  So I read the BioC FAQ and it says...

http://www.bioconductor.org/docs/faq/

Bug reports on packages should perhaps be sent to the 
 package maintainer rather than to r-bugs.


So I send email to a maintainer, who I believe rightly points out

   best to send this kind of questions to the bioc mailing list, rather
than to myself privately, because other people might (a) also have
answers or (b) benefit from the questions  answers.

Could the FAQ possibly be revised to some sensible combination
that generates less finger pointing, such as

   Bug reports on packages should be sent to the Bioconductor mailing list, 
and sent or copied to the package maintainer, rather than to r-bugs.

or

   Bug reports on packages should be sent to the package maintainer, 
and copied to the Bioconductor mailing list, rather than to r-bugs.


Could the posting guides to R-help and R-devel do something
similar?


Sign me
Tired of all the finger pointing


http://www.r-project.org/posting-guide.html

 If the question relates to a contributed package , e.g., one downloaded 
  from CRAN, try contacting the package maintainer first. You can also 
  use find(functionname) and packageDescription(packagename) to 
  find this information. Only send such questions to R-help or R-devel if 
  you get no reply or need further assistance. This applies to both 
  requests for help and to bug reports.


How about

If the question relates to a contributed package , e.g., one downloaded 
from CRAN, email the list and be sure to additionally send to or copy to 
the package maintainer as well. You can use find(functionname) 
and packageDescription(packagename) to find this information. 
Only send such questions to one of R-help or R-devel. This applies to both 
requests for help and to bug reports.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How do I modify an exported function in a locked environment?

2006-07-20 Thread Steven McKinney



Running R.app on Mac OS X 10.4

 version
   _ 
platform   powerpc-apple-darwin8.6.0 
arch   powerpc   
os darwin8.6.0   
system powerpc, darwin8.6.0  
status   
major  2 
minor  3.1   
year   2006  
month  06
day01
svn rev38247 
language   R 
version.string Version 2.3.1 (2006-06-01)
 


I am trying to learn how to modify functions
in a locked environment.

For an example, suppose I'm using the package zoo.

zoo contains function rollmean.default

 rollmean.default
function (x, k, na.pad = FALSE, align = c(center, left, right), 
...) 
{
x - unclass(x)
n - length(x)
y - x[k:n] - x[c(1, 1:(n - k))]
y[1] - sum(x[1:k])
rval - cumsum(y)/k
if (na.pad) {
rval - switch(match.arg(align), left = {
c(rval, rep(NA, k - 1))
}, center = {
c(rep(NA, floor((k - 1)/2)), rval, rep(NA, ceiling((k - 
1)/2)))
}, right = {
c(rep(NA, k - 1), rval)
})
}
return(rval)
}
environment: namespace:zoo

Suppose for whatever reason I want output to be
in percent, so I'd like to modify the result to be
rval - 100 * cumsum(y)/k

I cannot just copy the function and change it, as the namespace
mechanism ensures the rollmean.default in 'zoo' continues to be used.

If I use
fixInNamespace(rollmean.default, ns = zoo)
I can edit the rval - cumsum(y)/k line to read
rval - 100 * cumsum(y)/k
save the file and exit the R.app GUI editor.

But this does not update the exported copy of the
function (the documentation for fixInNamespace says
this is the case) - how do I accomplish this last step?

If I list the function after editing, I see the original
copy.  But if I reinvoke the editor via fixInNamespace(),
I do see my modification.
Where is my copy residing?  How do I push it out
to replace the exported copy?

Is this the proper way to modify a package function?
Are there other ways?  I've searched webpages, R news,
help files and have been unable to find out how to
get this process fully completed.

Any guidance appreciated.


Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: [EMAIL PROTECTED]

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] scripts to process array CGH data files from NimbleGen

2006-06-12 Thread Steven McKinney



Before I reinvent the wheel, has anyone set up scripts to read in
array CGH data files from NimbleGen, for use in the aCGH package or
a similar package?

I have normalized NimbleGen files
([array_id]_[identifier]_normalized.txt in the NimbleGen naming scheme)
several to a directory, to read in and use to create an aCGH object.

Any info appreciated.

Best

Steve McKinney



Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lmer coefficient distributions and p values

Re: [R] Systematically biased count data regression model

Re: [R] FW: Selecting undefined column of a data frame (was[BioC]read.phenoData vs read.AnnotatedDataFrame)

[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

Re: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

Re: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

Re: [R] Extracting a website text content using R

[R] Design matrix question

Re: [R] reodering factor

Re: [R] Computing an ordering on subsets of a data frame

Re: [R] extracting intercept from ppr fit

Re: [R] Extracting approximate Wald test (Chisq) fromcoxph(..frailty)

Re: [R] problem with RCurl install on Unix

Re: [R] replacing all NA's in a dataframe with zeros...

Re: [R] dendrogram - got it , just need to label :)

Re: [R] Double-banger function names: preferences and suggestions

Re: [R] Google Custom Search Engine for R

Re: [R] How to plot two graphs on one single plot?

[R] Bug/problem reporting: Possible to modify posting guide FAQ?

[R] How do I modify an exported function in a locked environment?

[R] scripts to process array CGH data files from NimbleGen

24 matches

Site Navigation

Mail list logo

Footer information