Re: [R] Random Forest

2007-04-23 Thread Arne.Muller
Ruben,

Maybe your binary response is a numeric vector - try converting it into
a factor with two levels. You probably want classification rather than
regression (the dependent variable should be numeric and continous)!

   Arne

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Ruben Feldman
Sent: Monday, April 23, 2007 10:28 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Random Forest

Hi R-wizards,

I ran a random forest on a dataset where the response variable 
had two possible values. It returned a warning telling me that 
it did regression and if that was really what I wanted.
Does anybody know what is being in terms of the algorithm when 
it does a regression? (the random forest is used as a 
regression, how does that work?)

Thanks for your time!

Ruben

   [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting very long character string

2006-11-02 Thread Arne.Muller
Hello,

thanks a lot for your help on splitting the string to get a numeric vector. I'm 
now writign the string to a tempfile and read it in via scan - this is fast 
enough for me:

library(XML);

...
tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T);
tmp = xmlValue(tmp[[1]]);
cat(paste('splitting', nchar(tmp), 'string ...\n'));
tmp.file = tempfile();
sink(tmp.file);
cat(tmp);
sink();
tmp = scan(tmp.file);
unlink(tmp.file);
cat(paste('splitting done,', length(tmp), 'elements\n'));

thanks again
and kind regards,

Arne

 -Original Message-
 From: john seers (IFR) [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 01, 2006 17:01
 To: Muller, Arne PH/FR; r-help@stat.math.ethz.ch
 Subject: RE: [R] splitting very long character string
 
 
 
 Hi Arne
 
 If you are reading in from files and they are just one number per line
 it would be more efficient to use scan directly.  ?scan
 
 For example:
 
  filen-C:/temp/tt.txt
  i-scan(filen)
 Read 5 items
  i
 [1]   12345  5643765674 63566565666
  
 
 
  
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 [EMAIL PROTECTED]
 Sent: 01 November 2006 15:47
 To: r-help@stat.math.ethz.ch
 Subject: [R] splitting very long character string
 
 
 Hello,
 
 I've a very long character array (500k characters) that need to split
 by '\n' resulting in an array of about 60k numbers. The help 
 on strsplit
 says to use perl=TRUE to get better formance, but still it 
 takes several
 minutes to split this string.
 
 The massive string is the return value of a call to 
 xmlElementsByTagName
 from the XML library and looks like this:
 
 
 12345
 564376
 5674
 6356656
 5666
 
 
 I've to read about a hundred of these files and was wondering whether
 there's a more efficient way to turn this string into an array of
 numerics. Any ideas?
 
   thanks a lot for your help
   and kind regards,
 
   Arne
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting very long character string

2006-11-01 Thread Arne.Muller
Hello,

I've a very long character array (500k characters) that need to split by '\n' 
resulting in an array of about 60k numbers. The help on strsplit says to use 
perl=TRUE to get better formance, but still it takes several minutes to split 
this string.

The massive string is the return value of a call to xmlElementsByTagName from 
the XML library and looks like this:

...
12345
564376
5674
6356656
5666
...

I've to read about a hundred of these files and was wondering whether there's a 
more efficient way to turn this string into an array of numerics. Any ideas?

thanks a lot for your help
and kind regards,

Arne




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] graphics and 'layout' question

2006-09-15 Thread Arne.Muller
Hello,

I got stuck with a graphics question: I've 3 figures that I present on a single 
page (window) via 'layout'. The layout is 

layout(matrix(c(1,1,2,3), 2, 2, byrow=TRUE));

so that the frst plot spans the both columns in row one. Now I'd like to 
magnify the fist figure so that it takes 20% more vertical space (i.e. more 
space for the y-axis). How would I do this in R?

thanks a lot for your help,

Arne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] randomForest question

2006-07-26 Thread Arne.Muller
Hello,

I've a question regarding randomForest (from the package with same name). I've 
16 featurs (nominative), 159 positive and 318 negative cases that I'd like to 
classify (binary classification).

Using the tuning from the e1071 package it turns out that the best performance 
if reached when using all 16 features per tree (mtry=16). However, the 
documentation of randomForest suggests to take the sqrt(#features), i.e. 4. How 
can I explain this difference? When using all features this is the same as a 
classical decision tree, with the difference that the tree is built and tested 
with different data sets, right?

example (I've tried different configurations, incl. changing ntree):
 param - try(tune(randomForest, class ~ ., data=d.all318, 
 range=list(mtry=c(4, 8, 16), ntree=c(1000;

 summary(param)

Parameter tuning of `randomForest':

- sampling method: 10-fold cross validation 

- best parameters:
 mtry ntree
   16  1000

- best performance: 0.1571809 

- Detailed performance results:
  mtry ntree error
14  1000 0.1928635
28  1000 0.1634752
3   16  1000 0.1571809

thanks a lot for your help,

kind regards,

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data import problem

2006-03-08 Thread Arne.Muller
Dear All,

I'm trying to read a text data file that contains several records separated by 
a blank line. Each record starts with a row that contains it's ID and the 
number of rows for the records (two columns), then the data table itself, e.g. 

123 5
89.17911.1024
90.57351.1024
92.56661.1024
95.07251.1024
101.20701.1024

321 3
60.16011.1024
64.80231.1024
70.05932.1502

...

I thought I coudl simply use something line this:

con - file(test2.txt);
do {
e - read.table(con, nlines = 1);
if ( length(e) == 2 ) {
d - read.table(con, nrows = e[1,2]);
#process data frame d
}
} while (length(e) == 2);

The problem is that read.table closes the connection object, I assumed that it 
would not close the connection, and instead contines where it last stopped.

Since the data is nearly a simple table I though read.table could work rather 
than using scan directly. Any suggestions to read this file efficently are 
welcome (the file can contain several thousand record and each record can 
contain several thousand rows).

thanks a lot for your help,
+kind regards,

Arne


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data import problem

2006-03-08 Thread Arne.Muller
Well, the data is generated by a perl script, and I could just configure the 
perl script so that there is one file per data table, but I though I'd probably 
must more efficent to have all records in a single file rather than reading a 
thousands of small files ... .

kind regards,

Arne

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Philipp Pagel
Sent: Wednesday, March 08, 2006 12:44
To: r-help@stat.math.ethz.ch
Subject: Re: [R] data import problem


On Wed, Mar 08, 2006 at 12:32:28PM +0100, [EMAIL PROTECTED] wrote:
 I'm trying to read a text data file that contains several records
 separated by a blank line. Each record starts with a row that contains
 it's ID and the number of rows for the records (two columns), then the
 data table itself, e.g. 
 
 123 5
 89.17911.1024
 90.57351.1024
 92.56661.1024
 95.07251.1024
 101.20701.1024
 
 321 3
 60.16011.1024
 64.80231.1024
 70.05932.1502

That sound like a job for awk. I think it will be much easier to
transform the data into a flat table using awk, python or perl an then
just read the table with R. 

cu
Philipp

-- 
Dr. Philipp PagelTel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics  Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS  Tel.  +49-89-3187 3675
GSF - National Research Center   Fax.  +49-89-3187 3585
  for Environment and Health
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] calculating IC50

2006-02-02 Thread Arne.Muller
Hello,

I was wondering if there is an R-package to automatically calculate the IC50 
value (concentration of a substrance that inhibits cell growth to 50%) for some 
measurements.

kind regards,

Arne


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Dynamic Programming in R

2006-01-20 Thread Arne.Muller
Hello,

I've implemented dynamic programming for aligning spectral data (usually 100 to 
200 peaks in one spectrum, but some spectra contain  5k peaks) entirely in R. 
As François Pinard pointed out, the memory usage should be proportional to the 
n x n dynamic programming matrix, and I've not yet had any problems on my 
machine (R2.2.0 win2k, 1GB mem, 2GHz Intel PV), CPU seems to be the more 
problematic issue. 

I guess it all depends on how much data you have. You could split the dynamic 
programming matrix into chunks and calculate them in parallel on different 
machines (but the implementatino of finding the optiomal trace will probably 
get a bit difficult).

kind regards,

Arne

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Arnab mukherji
Sent: Thursday, January 19, 2006 22:55
To: r-help@stat.math.ethz.ch
Subject: [R] Dynamic Programming in R


Hi R users,

I am looking to numerically solve a dynamic program in the R environment. I was 
wondering if there were people out there who had expereinced success at using R 
for such applications. I'd rather continue in R than learn Mathlab.

A concern that has been cited that may discourage R use for solving dynamic 
programs is its memory handling abilities.  A senior researcher had a lot of 
trouble with R becuase on any given run it would eat up all the computers 
memory and need to start using the hard disk. Yet, the memory needed was not 
substantial - saving the worksapce, exiting and recalling would noticebly start 
of tthe progam at a much lower memory use, level and a quick deteroration in a 
few thousand iterations.

Is this a problem other people have come across? Perhaps, its a problem already 
fixed, since the researcher was working on this in 2002 (he claimed he had 
tried it on windows, mac, and unix versions to check). 

Thanks.

Arnab

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RMySQL/DBI

2006-01-06 Thread Arne.Muller
Hello,

does anybody run RMySQL/DBI successfully on SunOS5.8 and MySQL 3.23.53 ? I'll 
get a segmentation fault whe trying to call dbConnect. We'll soon swtich to 
MySQL 4, however,  I was wondering whether the very ancient mysql version realy 
is the problem ... 

RMySQL 0.5-5
DBI 0.1-9
R 2.2.0
SunOS 5.8

kind regards and thanks a lot for your help,

Arne


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] trellis: style of axis labels

2005-12-12 Thread Arne.Muller
Hello,

is it possible to get xyplot of package lattice to acknowledge par(las=2)? In 
my trellis plot the x-axis lables are overlapping (they're factors with rather 
long level names), and I'd like to have them vertical. The trellis plot doesn't 
seem to read the 'par' settings, and trellist.par.set neither :-( 

thanks for your help,
+kind regards,

Arne


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] data frames and factors

2005-11-24 Thread Arne.Muller
Hello,

I have prepared an svm on some training data and would like to use the svm 
model for predicting binary outcome from new data.

The input data frame contains several numeric and factor variables. Usually I 
construct the input matrix of the entities to be predicted with a perl script 
that writes it to a file (since the data comes from different sources and some 
text processing is needed). This file is then read read via read.table within 
R. It is possible that I'd like to perform prediction on many new cases or on a 
single new case.

There are now two problems:

1. If the constructed matrix for the cases to be predicted does not contain all 
the factor levels that were used to build the model (the factor levels found 
the training set) the svm throws an error (Error in scale ...).

I've tried to factors, but instead of getting the level labes I get the numeric 
values:

 tmp - sapply(11:15, function(i) factor(new.dat[,i], 
 levels=c('A','C','G','T')))
 tmp
  [,1] [,2] [,3] [,4] [,5]
 [1,]34422
 [2,]42211
 [3,]21111
 [4,]11111
 [5,]11213
 [6,]21343
 [7,]34331
 [8,]33141
 [9,]14114
[10,]11444

 new.dat[,14]
 [1] C A A A A T G T A T


2. When reading a data frame with the variables and factos for a single new 
case (one row), read.table always treats the variables as strings (variables 
and factors), and worse - one of the factors contains a level named 'T' that is 
replaced by TRUE during read.table. I've tried as.is = T and F, and the result 
for she single row data frame is the same (T is replaxced by TRUE). I'm running 
R 2.1.0.

Any suggestions how to read a data frame (with at least one row) and to treat 
factor columns as such, and how to adjust the factor levels before passing the 
data frame to predict.svm?

thanks in advance,
+kind regards,

Arne


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] basic anova and t-test question

2005-08-26 Thread Arne.Muller
Hello,

I'm posting this to receive some comments/hints about a rather statistical than 
R-technical question ... .

In an anova of a lme factor SSPos11 shows up non-significant, but in the t-test 
of the summay 2 of the 4 levels (one for constrast) are significant. See below 
for some truncated output.

I realize that the two test are different (F-test/t-test), but I'm looking for 
for a meaning. Maye you have a schenario that explains how these differences 
can be created and how you'd go ahead and analyse it further.

When I use SSPos11 as te only fixed effect, it does it is not significant in 
either anova nor t-test, and a boxplot of the factor shows that the levels are 
all quite similar (similar variance and mean). Might the effect I observe be 
linked to an unbalance design in the multifactorial model?

thanks a lot for your help,
+kind regards,

Arne

 anova(fit)
 numDF denDF  F-value p-value
(Intercept)  1   540 323.4442  .0001
SSPos1   3   540  15.1206  .0001
...
SSPos11  3   540   1.1902  0.3128
...

 summary(fit)
Linear mixed-effects model fit by REML
 Data: d.orig 
   AIC  BIClogLik
  1007.066 1153.168 -469.5329

Random effects:
 Formula: ~1 | Method
(Intercept)  Residual
StdDev:   0.4000478 0.4943817

Fixed effects: log(value + 7.5) ~ SSPos1 + SSPos2 + SSPos6 + SSPos7 + SSPos10 + 
SSPos11 + SSPos13 + SSPos14 + SSPos18 + SSPos19 +  
  Value  Std.Error  DF   t-value p-value
(Intercept)   2.8621811 0.23125065 540 12.376964  0.
SSPos1C  -0.1647937 0.06293993 540 -2.618269  0.0091
SSPos1G  -0.3448095 0.05922479 540 -5.822047  0.
SSPos1T   0.1083988 0.06087095 540  1.780797  0.0755
...
SSPos11C -0.1540292 0.06171635 540 -2.495761  0.0129
SSPos11G -0.1428980 0.05993122 540 -2.384368  0.0175
SSPos11T -0.0039434 0.06133920 540 -0.064289  0.9488
...


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RandomForest question

2005-07-21 Thread Arne.Muller
Hello,

I'm trying to find out the optimal number of splits (mtry parameter) for a 
randomForest classification. The classification is binary and there are 32 
explanatory variables (mostly factors with each up to 4 levels but also some 
numeric variables) and 575 cases.

I've seen that although there are only 32 explanatory variables the best 
classification performance is reached when choosing mtry=80. How is it possible 
that more variables can used than there are in columns the data frame?

thanks for your help
+ kind regards,

Arne




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] p-values for classification

2005-07-01 Thread Arne.Muller
Dear All,

I'm classifying some data with various methods (binary classification). I'm 
interpreting the results via a confusion matrix from which I calculate the 
sensitifity and the fdr. The classifiers are trained on 575 data points and my 
test set has 50 data points.

I'd like to calculate p-values for obtaining =fdr and =sensitifity for each 
classifier. I was thinking about shuffling/bootstrap the lables of the test 
set, classify them and calculating the p-value from the obtained normal 
distributed random fdr and sensitifity.

The problem is that it's rather slow when running many rounds of 
shuffling/classification (I'd like to do this for many classifiers and 
parameter combinations). In addition classification of the 50 test data points 
with shuffled lables realistically produces only a  very limited number of 
possible fdr's and sensitivities, and I'm wondering if I can realy believe 
these values to be normal.

Basically I'm looking for a way to calculate the p-values analytically. I'd be 
happy  for any suggestions, web-addresses or references.

kind regads,

Arne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] randomForest error

2005-06-30 Thread Arne.Muller
Hello,

I'm using the random forest package. One of my factors in the data set contains 
41 levels (I can't code this as a numeric value - in terms of linear models 
this would be a random factor). The randomForest call comes back with an error 
telling me that the limit is 32 categories.

Is there any reason for this particular limit? Maybe it's possible to recompile 
the module with a different cutoff?

thanks a  lot for your help,
kind regards,


Arne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] svm and scaling input

2005-06-28 Thread Arne.Muller
Dear All,

I've a question about scaling the input variables for an analysis with svm 
(package e1071). Most of my variables are factors with 4 to 6 levels but there 
are also some numeric variables.

I'm not familiar with the math behind svms, so my assumtions maybe completely 
wrong ... or obvious. Will the svm automatically expand the factors into a 
binary matrix? If I add numeric variables outside the range of 0 to 1 do I have 
to scale them to have 0 to 1 range?

thanks a lot for help,

+kind regards,

Arne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] bug in predict.lme?

2005-06-08 Thread Arne.Muller
Dear All,

I've come across a problem in predict.lme. Assigning a model formula to a  
variable and then using this variable in lme (instead of typing the formula 
into the formula part of lme) works as expect. However, when performing a 
predict on the fitted model I gan an error messag - predict.lme (but not 
predictlm) seems to expect a 'properly' typed in formula and a cannot extract 
the formula from the variable. THe code below demonstrates this.

Is this a known or expected behavour of predict.lme or is this a bug?

kind regards,

Arne

(R-2.1.0)
 library(nlme)
...
 mod - distance ~ age + Sex # example from ?lme
 mod
distance ~ age + Sex
 fm2 - lme(mod, data = Orthodont, random = ~ 1)
 anova(fm2)
numDF denDF  F-value p-value
(Intercept) 180 4123.156  .0001
age 180  114.838  .0001
Sex 1259.292  0.0054
 fm2
Linear mixed-effects model fit by REML
  Data: Orthodont 
  Log-restricted-likelihood: -218.7563
  Fixed: mod 
 
...

 predict(fm2,  Orthodont)
Error in mCall[[fixed]][-2] : object is not subsettable

 fm2 - update(fm2, . ~ .) # this replaces mod by the contents of variable 
 mod
 fm2
Linear mixed-effects model fit by REML
  Data: Orthodont 
  Log-restricted-likelihood: -218.7563
  Fixed: distance ~ age + Sex 
  ...

 predict(fm2,  Orthodont)
 M01  M01  M01  M01  ... 
25.39237 26.71274 28.03311 29.35348 21.61052 ...
 

 fm2 - lm(mod, data = Orthodont)
 predict(fm2,  Orthodont)
   1234 ...
22.98819 24.30856 25.62894 26.94931 ...

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] lm/lme cross-validation

2005-05-31 Thread Arne.Muller
Hello,

is there a special package/method to cross-validate linear fixed effects and 
mixed effects models (from lme)? I've tried cv.glm on an lme (hoping that it 
may deal with any kind of linear model ...), but it raises an error:

Error in eval(expr, envir, enclos) : couldn't find function lme.formula

so I guess it's not dealing with an lme.

I've realized that removing randomly some lines from the data frame used for 
lme strongly changes the the estimates and reduces the correlation between 
fitted and actual values. Therefore I'd like to get a more realistic view of 
the prediction performance.

Any ideas are welcome,

+thanks,

Arne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] error in plot.lmList

2005-05-13 Thread Arne.Muller
Hello,

in R-2.1.0 I'm trying to prodice trellis plots from an lmList object as 
described in the help for plot.lmList. I can generate the plots from the help, 
but on my own data plotting fails with an error message that I cannot interpret 
(please see below). Any hints are greatly appreciapted.

kind regards,

Arne

 dim(d)
[1] 575   4
 d[1:3,]
  Level_of_Expression SSPos1 SSPos19 Method
111.9  G   A   bDNA
224.7  T   T   bDNA
3 9.8  C   T   bDNA
 fm - lmList(Level_of_Expression ~ SSPos1 + SSPos19 | Method, data=d)
 fm
Call:
  Model: Level_of_Expression ~ SSPos1 + SSPos19 | Method 
   Data: d 

Coefficients:
   (Intercept)   SSPos1CSSPos1G   SSPos1T SSPos19C SSPos19G   
SSPos19T
bDNA  25.75211 -6.379701  -9.193304 10.371056 24.32171 24.06107  
9.7357724
Luciferase23.79947  4.905679  -7.747861  8.112779 48.95151 48.15064 
-0.2646783
RT-PCR56.08985 -7.352206 -15.896556 -2.712313 19.91967 24.28425 
-2.2317071
Western   14.03876  2.777038 -14.113157 -7.804959 24.62684 25.50382  
8.3864782

Degrees of freedom: 575 total; 547 residual
Residual standard error: 25.39981
 plot(fm, Level_of_Expression ~ fitted(.))
Error in plot.lmList(fm, Level_of_Expression ~ fitted(.)) : 
Object cF not found

what is object cF ...?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] casting lm.fit output to an lm object

2005-01-06 Thread Arne.Muller
Hello,

Is it possible to cast the output of lm.fit to an lm object? I've 10,000 linear 
models for a gene expression experiment, all of which have the same model 
matrix. Maybe calling lm.fit on a model matrix and a data vector is faster than 
lm. I'd like to use each fit for an anova as well as comparing different models 
via anova.

kind regards,

Arne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Re: The hidden costs of GPL software?

2004-11-18 Thread Arne.Muller
[...]
 I am a biologist coming to R via Bioconductor. I have no computer 
 background in computer sciences and only basic undergraduate training 
 level in statistics.
 
 I have used R with great pleasure and great pains. The most difficult 
 thing is to know what functions to use - sometimes I know that one 
 function is most likely available, but there's really no easy way to 
 get it (yes, even going to the archives and reading the help files). 
 I feel that more examples in the help files would definitely be a 
 good way to fully understand the potencial of the functions. I know 
 how difficult this is to do and how much of a time sink it must be.

Yes, I' often have the same problem when it comes to programming in R (data 
manipulation, formatting etc ...). When thinking about a solution, I often come 
up with something slow and complicated. A positng to this list usually reveals 
a very simple solution thanks to a function that I didn't find when exploring 
help, help.search and the archives (and thanks to those who give me the hint 
;-). However, I don't know how to improve this, i.e. how to implement a more 
sophisticated help.search. Maybe the keywords in the help files or some kind of 
free text mining would help - well, maybe this is a bit over the top.

On the other hand, when it comes to the statistics (I'm a not a statistician) 
and it's minimal formatting of data etc , I think that developing an 
understanding of the stats itself is the main probelm and a GUI doesn't help 
very much in for this. Once the basic understanding is there (which one needs 
anyway, even with a GUI), the rest is not too difficult. In addition I usually 
need to script the calculations for many different datasets, and again most 
GUIs are bad in repeating tasks systematically.

I've spent quite some time with learing R (and I haven't stoped yet ;-), but 
it's devinitely worth it. As a scientists I appreciate it, and since it is a 
tool that use often, I would not exchange the command-line for any GUI.

This list and the many books and manuals (mentioned in the other postings here) 
do a pretty good job in teaching R!

kind regards,

Arne

[...]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] printing to stderr

2004-11-10 Thread Arne.Muller
Hello,

is it possible to configure the print function to print to stderr?

kind regards,

Arne

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Boxplot, space to axis

2004-09-30 Thread Arne.Muller
Hello,

I've crearted a boxplot with 84 boxes. So fat everything is as I expect, but there is 
quite some space between the 1st box and axis 2 and the last box and axis 4. Since 84 
boxes get very slim anyway I'd like to discribute as much of the horizontal space over 
the x-axis.

Maybe I've forgotten about a graphics parameter?

Thanks for your help,

Arne

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Boxplot, space to axis

2004-09-30 Thread Arne.Muller
Hello Deepayan,

thanks for your suggestion, xaxs='i' works, but it leaves no space at all. I though 
this may be configurable by a real value.

kind regards,

Arne

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Deepayan Sarkar
 Sent: 30 September 2004 17:12
 To: [EMAIL PROTECTED]
 Cc: Muller, Arne PH/FR
 Subject: Re: [R] Boxplot, space to axis
 
 
 On Thursday 30 September 2004 09:41, [EMAIL PROTECTED] wrote:
  Hello,
 
  I've crearted a boxplot with 84 boxes. So fat everything is as I
  expect, but there is quite some space between the 1st box and axis 2
  and the last box and axis 4. Since 84 boxes get very slim anyway I'd
  like to discribute as much of the horizontal space over the x-axis.
 
  Maybe I've forgotten about a graphics parameter?
 
 Perhaps par(xaxs = i) ?
 
 Deepayan
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] strange tickmarks placing in image

2004-08-03 Thread Arne.Muller
Hello,

I've a problem aligning tickmarks to an image. I've created a correlation matrix for 
84 datasets. I'm visualizing the matrix as an image with colour coding according to 
the correlation coefficient.

The 84 datasets are distributed over three factors, but the desgin is unbalanced, so 
that the tickmarks and the lables for the axis must not evenly distributed. A regular 
grid via the 'grid' function aligns perfectly with the image cells, but the tickmarks 
via axis are slightely shifted, and not aligned perfectly with the image cells. The 
offset is even stonger for the y-axis. The thing is that I don't want 84 lables at the 
axis, it's enough to have one lable for all the different factor level combinations, 
which results in 28 labels.

Maybe you have an idea how to setup the command to align the tick marks properly.

thanks for your help, kind regards,

Arne


Here are my commands:

 library(marrayPlots) # for the colors
 col - maPalette(low='white', high='darkred', k=50)
 par(ps=8, cex=1, mar=c(1,5,5,1)) # space needed for lables @ axis 1 and 3

# x and y range from 1 to 84, x is the correlation matrix (dim = 84x84)
 image(1:84, 1:84, x, col=col, xaxt='n', yaxt='n', xlab='', ylab='')

# set up the axis, 28 lables, distributed un-evenly over the image axis
 axis(3, i, labels=names(l), las=2, tick=T)
 axis(2, i, labels=names(l), las=2, tick=T)
 grid(84, col='black', lty='solid') # grids each of the 84 cells

# this is where the lables come form, the number indicate the replicates
# per factor-level combinations
 l
NEW:4:0   NEW:4:100   NEW:4:250   NEW:4:500  NEW:4:1000NEW:24:0 
  3   3   3   3   3   3 
 NEW:24:100  NEW:24:250  NEW:24:500 NEW:24:1000 OLD:4:0   OLD:4:100 
  3   3   3   3   4   3 
  OLD:4:250   OLD:4:500  OLD:4:1000OLD:24:0  OLD:24:100  OLD:24:250 
  2   3   3   4   3   2 
 OLD:24:500 OLD:24:1000 PRG:4:0   PRG:4:100   PRG:4:250  PRG:4:1000 
  3   3   3   3   3   3 
   PRG:24:0  PRG:24:100  PRG:24:250 PRG:24:1000 
  3   3   3   3 

# these are the positions along the axis for the tick marks,
# replicates from 1 to 3 (replicates of one factor-evel combination), 4 to 6  
# ...
 i
 [1]  3  6  9 12 15 18 21 24 27 30 34 37 39 42 45 49 52 54 57 60 63 66 69 72 75
[26] 78 81 84

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] binning a vector

2004-07-26 Thread Arne.Muller
Hello,

I was wondering wether there's a function in R that takes two vectors (of same length) 
as input and computes mean values for bins (intervals) or even a sliding window over 
these vectros.

I've several x/y data set (input/response) that I'd like plot together. Say the x-data 
for one data set goes from -5 to 14 with 12,000 values, then I'd like to bin the 
x-vector in steps of +1 and calculate and plot the mean of the x-values and the 
y-values within each bin.

I was browsing the R-docs but couldn't find anything appropiate.

thanks for hints + kind regads,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] unbalanced design for anova with low number of replicates

2004-06-28 Thread Arne.Muller
Hello,

I'm wondering what's the best way to analyse an unbalanced design with a low number of 
replicates. I'm not a statistician, and I'm looking for some direction for this 
problem.

I've a 2 factor design:

Factor batch with 3 levels, and factor dose within each batch with 5 levels. Dose 
level 1 in batch one is replicated 4 times, level 3 is replicated only 2 times. all 
other levels are replicated 3 times, except for batch level 3, for which dose 4 is 
missing. 

I've realised that the other of the factors is critical for the outcome of the anova 
(using lm and anova).

I guess the impact wouldn't be strong if there was a reasonably large numbe rof 
replicates within each cell (even though not balanced). However, since I've only 0 to 
4 replicates I'm worried that the standard anova may not be the way to go.

Are there special packages for unbalanced designs like this?

kind regards,

Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Perl--R interface

2004-06-23 Thread Arne.Muller
Hi,

look at http://www.omegahat.org/RSPerl/index.html. 

regards,

Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of XIAO LIU
 Sent: 23 June 2004 17:11
 To: [EMAIL PROTECTED]
 Subject: [R] Perl--R interface
 
 
 R users:
 
 My R is 1.8.1 in Linux.  How can I call R in Perl process? 
 And call Perl from R?
 
 Thanks
 
 Xiao
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] help with memory greedy storage

2004-05-14 Thread Arne.Muller
Hello,

I've a problem with a self written routine taking a lot of memory (1.2Gb). Maybe you 
can suggest some enhancements, I'm pretty sure that my implementation is not optimal 
...

I'm creating many linear models and store coefficients, anova p-values ... all I need 
in different lists which are then finally returned in a list (list of lists).

The input is a matrix with 84 rows and 100,000 rows. The routine probeDf below 
creates a data frame that assigns the 84 rows to the different factors, but not just 
for one row but for several rows, depending what which(rows == g),] returns, and a new 
factor ('probe') is generated. This results in a 1344 by 6 data frame.

Example data frame returned by probeDf:

   Value batch time  dose array probe
1   2.317804   NEW  24h 000mM 1 1
2   2.495390   NEW  24h 000mM 2 1
3   2.412247   NEW  24h 000mM 3 1
...
144 8.851469   OLD  04h 100mM60 2
145 8.801430   PRG  24h 000mM61 2
146 8.308224   PRG  24h 000mM62 2
...

This data frame is not the problem since, it gets generated on-the-fly per gene and is 
discarded afterwards (just that it takes some time to generate it).

Here comes the problematic routine:

### emat: matrix, model: formular for lm, contr: optional contrasts
probe.fit - function(emat, factors, model, contr=NULL)
{
rows - rownames(emat)
genes - unique(rows)
l - length(genes) 
### generate proper lables (names) for the anova p-values
difflabels - attr(terms(model),term.labels)
  aov- list() # anova p-values for factors + interactions
coef   - list() # lm coefficients
coefp  - list() # p-valuies for coefficients
rsq- list() # R-squared of fit
fitted - list() # fitted values
value  - list() # orig. values (used with fitted to get residuals)

  for ( g in genes ) { # loop over 12,000 genes
  ### g is the name that identifies 14 to 16 rows in emat
  ### d is the data frame for the lm
  d - probeDf(emat[which(rows == g),], facts)
  fit - lm(model, data = d, contrasts=contr)
  fit.sum - summary(fit)
  aov[[g]]   - as.vector(na.omit(anova(fit)$'Pr(F)'))
  names(aov[[g]]) - difflabels
  coef[[g]]   - coef(fit)[-1]
  coefp[[g]]  - coef(fit.sum)[-1,'Pr(|t|)']
  rsq[[g]]- fit.sum$'r.squared'
  value[[g]] - d$Value
  fitted[[g]] - fitted(fit)
}
  list(aov=aov, coefs=coef, coefp=coefp, rsq=rsq,
   fitted=fitted, values=values)
}

### create a data frame from a matrix (usually 16 rows and 84 columns)
### and a list of factors. Basically this repates the factors 16 times
### (for each row in the matrix). This results in a data frame with 84*16
### rows as many columns as there are factors + 2 (probe factor + value
### to be modeled later)
probeDf - function(emat, facts) {
df - NULL
n - 1
nsamp - ncol(emat)
for ( i in 1:nrow(emat) ) {
values - c(t(emat[i,]))
df.new - data.frame(Value = values, facts, probe = rep(n, nsamp))
n - n + 1
if ( !is.null(df) ) {
   df - rbind(df, df.new)
} else {
   df - df.new
}
}
df$probe - as.factor(df$probe)
df
}

If I remove coef, coefp, value and fitted from the loop in probe.fit the memory usage 
is moderate.

The problem is that each of the 12,000 genes contributes 148 coefficients (the model 
contains quite a few factors) and p-values, the fitted and value vectors are 1300 
elements long. I couldn't find a more compact form of storage that I is still easy to 
explore afterwards.

Suggestions on how to get this done more efficiently (in terms of memory) are 
greatfully received.

 kind regards,

 Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] storage of lm objects in a database

2004-05-13 Thread Arne.Muller
Hello,

I'd like to use DBI to store lm objects in a database. I've to analyze many of linear 
models and I cannot store them in a single R-session (not enough memory). Also it'd be 
nice to have them persistent.

Maybe it's possible to create a compact binary representation of the object (the kind 
of format created created by save), so that one doesn't need to write a conversion 
routine for these objects (or maybe there's already a conversion available for lm?). I 
assume that the data do not need to be analyzed with a any other software than R.

I'm happy for any suggestions and links to get some more info on this.

kid regards,

Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R versus SAS: lm performance

2004-05-11 Thread Arne.Muller
Hello,

A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. 
He got the same results, but SAS took a bit more than a minute whereas S+ took 17 
minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of 
memory, and I assume that all machines have similar hardware, but the S+ and SAS 
machines are on windows whereas the R machine is Redhat Linux 7.2.

My question is if I'm doing something wrong (technically) calling the lm routine, or 
(if not), how I can optimize the call to lm or even using an alternative to lm. I'd 
like to run about 12,000 of these models in R (for a gene expression experiment - one 
model per gene, which would take far too long).

I've run the follwong code in R (and S+):

 options(contrasts=c('contr.helmert', 'contr.poly'))

The 1st colum is the value to be modeled, and the others are factors.

 names(df.gene1data) - c(Va, Ba, Ti, Do, Ar, Pr)
 df[c(1:2,1343:1344),]
   VaDo  Ti  Ba ArPr
12.317804 000mM 24h NEW  1 1
22.495390 000mM 24h NEW  2 1
8315 2.979641 025mM 04h PRG 8316
8415 4.505787 000mM 04h PRG 8416

this is a dataframe with 1344 rows.

x - Sys.time();
wlm - lm(Va ~
Ba+Ti+Do+Pr+Ba:Ti+Ba:Do+Ba:Pr+Ti:Do+Ti:Pr+Do:Pr+Ba:Ti:Do+Ba:Ti:Pr+Ba:Do:Pr+Ti:Do:Pr+Ba:Ti:Do:Pr+(Ba:Ti:Do)/Ar,
 data=df, singular=T);
difftime(Sys.time(), x)

Time difference of 15.3 mins

 anova(wlm)
Analysis of Variance Table

Response: Va
 Df Sum Sq Mean Sq   F valuePr(F)
Ba20.1 0.10.4262  0.653133
Ti12.6 2.6   16.5055 5.306e-05 ***
Do46.8 1.7   10.5468 2.431e-08 ***
Pr   15 5007.4   333.8 2081.8439  2.2e-16 ***
Ba:Ti 23.2 1.69.8510 5.904e-05 ***
Ba:Do 72.8 0.42.5054  0.014943 *  
Ba:Pr30   80.6 2.7   16.7585  2.2e-16 ***
Ti:Do 48.7 2.2   13.5982 9.537e-11 ***
Ti:Pr152.4 0.21.0017  0.450876
Do:Pr60   10.2 0.21.0594  0.358551
Ba:Ti:Do  71.4 0.21.2064  0.296415
Ba:Ti:Pr 305.6 0.21.1563  0.259184
Ba:Do:Pr105   14.2 0.10.8445  0.862262
Ti:Do:Pr 60   14.8 0.21.5367  0.006713 ** 
Ba:Ti:Do:Pr 105   15.8 0.20.9382  0.653134
Ba:Ti:Do:Ar  56   26.4 0.52.9434 2.904e-11 ***
Residuals   840  134.7 0.2

The corresponding SAS program from my collegue is:

proc glm data = the name of the data set;

class B T D A P;

model V = B T D P B*T B*D B*P T*D T*P D*P B*T*D B*T*P B*D*P T*D*P B*T*D*P A(B*T*D);

run;

Note, V = Va, B = Ba, T = Ti, D = Do, P = Pr, A = Ar of the R-example

kind regards + thanks a lot for your help,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] R versus SAS: lm performance

2004-05-11 Thread Arne.Muller
Hello,

thanks for your reply. I've now done the profiling, and I interpret that the most time 
is spend in the fortran routine(s):

Each sample represents 0.02 seconds.
Total run time: 920.21999453 seconds.

Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.

   %   total   %   self
 totalseconds selfsecondsname
100.00920.22  0.02  0.16 lm
 99.96919.88  0.10  0.88 lm.fit
 99.74917.84 99.74917.84 .Fortran
  0.07  0.66  0.02  0.14 storage.mode-
  0.06  0.52  0.00  0.00 eval
  0.06  0.52  0.04  0.34 as.double
  0.02  0.22  0.02  0.22 colnames-
  0.02  0.20  0.02  0.20 structure
  0.02  0.18  0.02  0.18 model.matrix.default
  0.02  0.18  0.02  0.18 as.double.default
  0.02  0.18  0.00  0.00 model.matrix
  0.01  0.08  0.01  0.08 list

   %   self%   total
 self secondstotalsecondsname
 99.74917.84 99.74917.84 .Fortran
  0.10  0.88 99.96919.88 lm.fit
  0.04  0.34  0.06  0.52 as.double
  0.02  0.22  0.02  0.22 colnames-
  0.02  0.20  0.02  0.20 structure
  0.02  0.18  0.02  0.18 as.double.default
  0.02  0.18  0.02  0.18 model.matrix.default
  0.02  0.16100.00920.22 lm
  0.02  0.14  0.07  0.66 storage.mode-
  0.01  0.08  0.01  0.08 list

I guess this actually means I cannot do anything about it ... other than maybe 
splitting the problem into different (independaent parts - which I actually may be 
able to).

Regarding the usage of lm.fit instead of lm, this might be a good idea, since I am 
using the same model.matrix for all fits! However, I'd need to recreate an lm object 
from the output, because I'd like to run the anova function on this. I'll first do 
some profiling on lm versus lm.fit for the 12,000 models ...

kind regards + thanks again for your help,

Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 Sent: 11 May 2004 09:08
 To: Muller, Arne PH/FR
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] R versus SAS: lm performance
 
 
 The way to time things in R is system.time().
 
 Without knowing much more about your problem we can only 
 guess where R is 
 spending the time.  But you can find out by profiling -- see 
 `Writing R 
 Extensions'.
 
 If you want multiple fits with the same design matrix (do you?) you 
 could look at the code of lm and call lm.fit repeatedly yourself.
 
 On Mon, 10 May 2004 [EMAIL PROTECTED] wrote:
 
  Hello,
  
  A collegue of mine has compared the runtime of a linear 
 model + anova in SAS and S+. He got the same results, but SAS 
 took a bit more than a minute whereas S+ took 17 minutes. 
 I've tried it in R (1.9.0) and it took 15 min. Neither 
 machine run out of memory, and I assume that all machines 
 have similar hardware, but the S+ and SAS machines are on 
 windows whereas the R machine is Redhat Linux 7.2.
  
  My question is if I'm doing something wrong (technically) 
 calling the lm routine, or (if not), how I can optimize the 
 call to lm or even using an alternative to lm. I'd like to 
 run about 12,000 of these models in R (for a gene expression 
 experiment - one model per gene, which would take far too long).
  
  I've run the follwong code in R (and S+):
  
   options(contrasts=c('contr.helmert', 'contr.poly'))
  
  The 1st colum is the value to be modeled, and the others 
 are factors.
  
   names(df.gene1data) - c(Va, Ba, Ti, Do, Ar, Pr)
   df[c(1:2,1343:1344),]
 VaDo  Ti  Ba ArPr
  12.317804 000mM 24h NEW  1 1
  22.495390 000mM 24h NEW  2 1
  8315 2.979641 025mM 04h PRG 8316
  8415 4.505787 000mM 04h PRG 8416
  
  this is a dataframe with 1344 rows.
  
  x - Sys.time();
  wlm - lm(Va ~
  
 Ba+Ti+Do+Pr+Ba:Ti+Ba:Do+Ba:Pr+Ti:Do+Ti:Pr+Do:Pr+Ba:Ti:Do+Ba:Ti
 :Pr+Ba:Do:Pr+Ti:Do:Pr+Ba:Ti:Do:Pr+(Ba:Ti:Do)/Ar, data=df, singular=T);
  difftime(Sys.time(), x)
  
  Time difference of 15.3 mins
  
   anova(wlm)
  Analysis of Variance Table
  
  Response: Va
   Df Sum Sq Mean Sq   F valuePr(F)
  Ba20.1 0.10.4262  0.653133
  Ti12.6 2.6   16.5055 5.306e-05 ***
  Do46.8 1.7   10.5468 2.431e-08 ***
  Pr   15 5007.4   333.8 2081.8439  2.2e-16 ***
  Ba:Ti 23.2 1.69.8510 5.904e-05 ***
  Ba:Do 72.8 0.42.5054  0.014943 *  
  Ba:Pr30   80.6 2.7   16.7585  2.2e-16 ***
  Ti:Do 48.7 2.2   13.5982 9.537e-11 ***
  Ti:Pr152.4 0.21.0017  0.450876
  Do:Pr60   10.2   

RE: [R] R versus SAS: lm performance

2004-05-11 Thread Arne.Muller
Thanks All, for your help. There seems to be a lot I can try to speed up the fits. 
However, I'd like to go for a much simpler model which I think is justified  by the 
experiment itself, e.g; I may think about removing the nestinh (Ba:Ti:Do)/Ar.

The model matrix has 1344 rows and 2970 columns, and the rank of the matrix is 504. 
Therefore I think I should reformulate the model.

I was just stroke my the massive difference in performance when my collegue told me 
about the difference between SAS and S+.

kind regards,

Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

 -Original Message-
 From: Liaw, Andy [mailto:[EMAIL PROTECTED]
 Sent: 11 May 2004 14:20
 To: Muller, Arne PH/FR; [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: RE: [R] R versus SAS: lm performance
 
 
 I tried the following on an Opteron 248, R-1.9.0 w/Goto's BLAS:
 
  y - matrix(rnorm(14000*1344), 1344)
  x - matrix(runif(1344*503),1344)
  system.time(fit - lm(y~x))
 [1] 106.00  55.60 265.32   0.00   0.00
 
 The resulting fit object is over 600MB.  (The coefficient 
 compoent is a 504
 x 14000 matrix.)
 
 If I'm not mistaken, SAS sweeps on the extended cross product 
 matrix to fit
 regression models.  That, I believe, in usually faster than doing QR
 decomposition on the model matrix itself, but there are 
 trade-offs.  You
 could try what Prof. Bates suggested.
 
 Andy
 
  From: [EMAIL PROTECTED]
  
  Hello,
  
  thanks for your reply. I've now done the profiling, and I 
  interpret that the most time is spend in the fortran routine(s):
  
  Each sample represents 0.02 seconds.
  Total run time: 920.21999453 seconds.
  
  Total seconds: time spent in function and callees.
  Self seconds: time spent in function alone.
  
 %   total   %   self
   totalseconds selfsecondsname
  100.00920.22  0.02  0.16 lm
   99.96919.88  0.10  0.88 lm.fit
   99.74917.84 99.74917.84 .Fortran
0.07  0.66  0.02  0.14 storage.mode-
0.06  0.52  0.00  0.00 eval
0.06  0.52  0.04  0.34 as.double
0.02  0.22  0.02  0.22 colnames-
0.02  0.20  0.02  0.20 structure
0.02  0.18  0.02  0.18 model.matrix.default
0.02  0.18  0.02  0.18 as.double.default
0.02  0.18  0.00  0.00 model.matrix
0.01  0.08  0.01  0.08 list
  
 %   self%   total
   self secondstotalsecondsname
   99.74917.84 99.74917.84 .Fortran
0.10  0.88 99.96919.88 lm.fit
0.04  0.34  0.06  0.52 as.double
0.02  0.22  0.02  0.22 colnames-
0.02  0.20  0.02  0.20 structure
0.02  0.18  0.02  0.18 as.double.default
0.02  0.18  0.02  0.18 model.matrix.default
0.02  0.16100.00920.22 lm
0.02  0.14  0.07  0.66 storage.mode-
0.01  0.08  0.01  0.08 list
  
  I guess this actually means I cannot do anything about it ... 
  other than maybe splitting the problem into different 
  (independaent parts - which I actually may be able to).
  
  Regarding the usage of lm.fit instead of lm, this might be a 
  good idea, since I am using the same model.matrix for all 
  fits! However, I'd need to recreate an lm object from the 
  output, because I'd like to run the anova function on this. 
  I'll first do some profiling on lm versus lm.fit for the 
  12,000 models ...
  
  kind regards + thanks again for your help,
  
  Arne
  
  --
  Arne Muller, Ph.D.
  Toxicogenomics, Aventis Pharma
  arne dot muller domain=aventis com
  
   -Original Message-
   From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
   Sent: 11 May 2004 09:08
   To: Muller, Arne PH/FR
   Cc: [EMAIL PROTECTED]
   Subject: Re: [R] R versus SAS: lm performance
   
   
   The way to time things in R is system.time().
   
   Without knowing much more about your problem we can only 
   guess where R is 
   spending the time.  But you can find out by profiling -- see 
   `Writing R 
   Extensions'.
   
   If you want multiple fits with the same design matrix (do 
 you?) you 
   could look at the code of lm and call lm.fit repeatedly yourself.
   
   On Mon, 10 May 2004 [EMAIL PROTECTED] wrote:
   
Hello,

A collegue of mine has compared the runtime of a linear 
   model + anova in SAS and S+. He got the same results, but SAS 
   took a bit more than a minute whereas S+ took 17 minutes. 
   I've tried it in R (1.9.0) and it took 15 min. Neither 
   machine run out of memory, and I assume that all machines 
   have similar hardware, but the S+ and SAS machines are on 
   windows whereas the R machine is Redhat Linux 7.2.

My question is if I'm doing something wrong (technically) 
   

[R] strange result with contrasts

2004-04-20 Thread Arne.Muller
Hello,

I'm trying to reproduce some SAS result wit R (after I got suspicious with the result 
in R). I struggle with the contrasts in a linear model.

I've got three factors

 d$dose - as.factor(d$dose)   # 5 levels
 d$time - as.factor(d$time)   # 2 levels
 d$batch - as.factor(d$batch) # 3 levels

the data frame d contains 82 rows. There are 2 to 4 replicates of each dose within 
each time point and each batch. There's one dose completely missing from one batch.

I then generate Dunnett contrasts using the multicomp library:

 contrasts(d$dose) - contr.Dunnett(levels(d$dose), 1)
 contrasts(d$time) - contr.Dunnett(levels(d$time), 1)
 contrasts(d$batch) - contr.Dunnett(levels(d$batch), 1)

For the moment I'm just looking at the dose effects of the complete model:

 summary(lm(value ~ dose * time * batch, data = d))$coefficients[1:5,]
   Estimate Std. Error t value  Pr(|t|)
(Intercept)  6.80211741 0.01505426 451.8399839 1.962247e-101
dose010mM-000mM -0.03454211 0.04113846  -0.8396549  4.046723e-01
dose025mM-000mM -0.01972550 0.04288981  -0.4599111  6.473607e-01
dose050mM-000mM -0.12015983 0.05356935  -2.2430704  2.886726e-02 - significant
dose100mM-000mM  0.01252061 0.04113846   0.3043529  7.619872e-01

A collegue of mine has run the same data through a SAS program (listed below)

proc glm data = dftest;
  class dose time batch;
  model value = dose|time|batch;
  means dose / dunnett ('000mM');
  lsmeans dose /pdiff singular=1; 
run;

Giving the following p-values:
   Pr(|t|) 
  dose010mM-000mM  0.4047
  dose025mM-000mM  0.6474
  dose050mM-000mM  0.5745 ---
  dose100mM-000mM  0.7620

The p-values are the same expect for the one indicated.

A stripchart for the data in R shows that dose050mM-000mM should not be significant 
(it doesn't look different from e.g. dose025mM-000mM).

Do you've any suggestions what I'm doing wrong here (assuming that I believe the SAS 
result)? Any hints what I can do to further analyse this problem?

Many thanks for your help,
+regards,

Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Storing p-values from a glm

2004-04-06 Thread Arne.Muller
Hi,

for example one could do it this way:

v - summary(fit)$coefficients[,4]

the coefficient attribute is a matrix, and with the 4 you refere to the
pvalue (at least in lm - don't know if summary(glm) produces sligthely
different output).

to skip the intercept (1st row): v - summary(glmfit)$coefficients[-1,4]

hope this helps,

Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Roy Sanderson
 Sent: 06 April 2004 14:36
 To: [EMAIL PROTECTED]
 Subject: [R] Storing p-values from a glm
 
 
 Hello
 
 I need to store the P-statistics from a number of glm 
 objects.  Whilst it's
 easy to display these on screen via the summary() function, 
 I'm not clear
 on how to extract the P-values and store them in a vector.
 
 Many thanks
 Roy
 
 --
 --
 Roy Sanderson
 Centre for Life Sciences Modelling
 Porter Building
 University of Newcastle
 Newcastle upon Tyne
 NE1 7RU
 United Kingdom
 
 Tel: +44 191 222 7789
 
 [EMAIL PROTECTED]
 http://www.ncl.ac.uk/clsm
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] number point under-flow

2004-02-04 Thread Arne.Muller
Hello,

I've come across the following situation in R-1.8.1 (compile + running under
RedHat 7.1):

 phyper(24, 514, 5961-514, 53, lower.tail=T)
[1] 1
 phyper(24, 514, 5961-514, 53, lower.tail=F)
[1] -1.037310e-11

I'd expect the later to be 0 or some very small positive number. Is this a
number under-flow of the calculation? Do you think I'm safe if I just set the
result to 0 in these cases?

kind regards,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] number point under-flow

2004-02-04 Thread Arne.Muller
Hi,

yes, I did compile it with gcc 2.96 ... . Do you've an estimate on how bad
this error is, e.g. how much it effects the calculations in R?

kind regards,

Arne

 -Original Message-
 From: Roger D. Peng [mailto:[EMAIL PROTECTED]
 Sent: 04 February 2004 14:49
 To: Muller, Arne PH/FR
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] number point under-flow
 
 
 Did you compile with gcc-2.96?  I think there were some 
 problems with the floating point arithmetic with that 
 compiler (at least for the earlier versions released by Red 
 Hat).
 
 -roger
 
 [EMAIL PROTECTED] wrote:
  Hello,
  
  I've come across the following situation in R-1.8.1 
 (compile + running under
  RedHat 7.1):
  
  
 phyper(24, 514, 5961-514, 53, lower.tail=T)
  
  [1] 1
  
 phyper(24, 514, 5961-514, 53, lower.tail=F)
  
  [1] -1.037310e-11
  
  I'd expect the later to be 0 or some very small positive 
 number. Is this a
  number under-flow of the calculation? Do you think I'm safe 
 if I just set the
  result to 0 in these cases?
  
  kind regards,
  
  Arne
  
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
  


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cochran-Mantel-Haenszel problem

2003-12-11 Thread Arne.Muller
Hello,

I've tried to analyze some data with a CMH test. My 3 dimensional contingency
tables are 2x2xN where N is usually between 10 and 100.

The problem is that there may be 2 strata with opposite counts (the 2x2
contigency table for these are reversed), producing opposite odds ratios that
cancle out in the overall statistics. These opposite counts are very
important for my analysis, since they account for a dramatic difference.

Could you recommend alternative tests that take account for opposite counts?
Would you suggest a different strategy to analyze such data?

thanks a lot for your suggestions,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] multidimensional Fisher or Chi square test

2003-12-03 Thread Arne.Muller
Hello,

Is there a test for independence available based on a multidimensional
contingency table?

I've about 300 processes, and for each of them I get numbers for failures and
successes. I've two or more conditions under which I test these processes.

If I had just one process to test I could just perform a fisher or chisquare
test on a 2x2 contigency table, like this:

for one process:
conditionA  conditionB
ok  20  6
failed  190 156

From the table I can figure out if the outcome (ok/failed) is bound to one of
the conditions for a process. However, I'd like to know how different the 2
conditions are from each other considering all 300 processes, and I consider
the processes to be an additional dimension. 

My H0 is that both conditions are overall (considering all processes) the
same.

Could you give me a hint what kind of test of package I should look into?

kind regars + thanks for your help,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] significance in difference of proportions

2003-12-01 Thread Arne.Muller
Hello,

thanks for the replies to this subject. I'm using a fisher.test to test if
the proportions of my 2 samples are different (see Ted's example below).
 
The assumption was that the two samples are from the same population and that
they may contain a different number of positives (due to different
treatment). 

I may be able to figues out the true probability to get a positive, since I
for some of my experiments I know the entire population. E.g. the samples
(111 items, and 10 items) come from a population of 10,000 items, and I know
that there are 200 positives in the population.

Is it possible to use the fisher test for testing equallity of proportions
and to include the known probability to find a positive - would that make
sense at all? If the two samples come from the same population the
probability to find a positive shouldn't influence the test for difference of
proportions, should it? 

At some point I'd like to extend the statistics so that the two samples can
come from 2 different populations (with known probability for the positives).

I'm happy to receive suggestions and comments on this.

thanks a lot again for your help,

Arne 

 
 On 27-Nov-03 [EMAIL PROTECTED] wrote:
  I've 2 samples A (111 items) and B (10 items) drawn from the same
  unknown population. Witihn A I find 9 positives and in B 0
  positives. I'd like to know if the 2 samples A and B are different,
  ie is there a way to find out whether the number of positives is
  significantly different in A and B?
 
 Pretty obviously not, just from looking at the numbers:
 
 9 out of 111 - p = P(positive) approx = 1/10
 
 P(0 out of 10 when p = 1/10) is not unlikely (in fact = 0.35).
 
 However, a Fisher exact test will give you a respectable P-value:
 
  library(ctest)
  ?fisher.test
  fisher.test(matrix(c(102,9,10,0),nrow=2))
   [...]
   p-value = 1
   alternative hypothesis: true odds ratio is not equal to 1 
   95 percent confidence interval:
0.00 6.088391 
  fisher.test(matrix(c(102,9,9,1),nrow=2))
   p-value = 0.5926
  fisher.test(matrix(c(102,9,8,2),nrow=2))
   p-value = 0.2257
  fisher.test(matrix(c(102,9,7,3),nrow=2))
   p-value = 0.0605
  fisher.test(matrix(c(102,9,6,4),nrow=2))
   p-value = 0.01202
 
 So there's a 95% CI (0,6.1) for the odds ratio which, for
 identical probabilities of +, is 1.0 hence well within the CI.
 And, keeping the numbers for the larger sample fixed for
 simplicity, you have to go quite a way with the smaller one to get
 a result significant at 5%:
 
 (102,9):(7,3) - P = 0.06
 (102,9):(6,4) - P = 0.01
 
 and, to have 80% power (0.8 probability of this event), the
 probability of + in the second sample would have to be as
 high as 0.41.
 
 Conclusion: your second sample size is quite inadequate except
 to detect rather large differences between the true proportions
 in the two cases!
 
 Best wishes,
 Ted.
 
 
 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 167 1972
 Date: 27-Nov-03   Time: 17:43:00
 -- XFMail --


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] significance in difference of proportions

2003-11-27 Thread Arne.Muller
Hello,

I'm looking for some guidance with the following problem:

I've 2 samples A (111 items) and B (10 items) drawn from the same unknown
population. Witihn A I find 9 positives and in B 0 positives. I'd like to
know if the 2 samples A and B are different, ie is there a way to find out
whether the number of positives is significantly different in A and B?

I'm currently using prop.test, but unfortunately some of my data contains
less than 5 items in a group (like in the example above), and the test
statistics may not hold:

 prop.test(c(9,0), c(111,10))

2-sample test for equality of proportions with continuity correction

data:  c(9, 0) out of c(111, 10) 
X-squared = 0.0941, df = 1, p-value = 0.759
alternative hypothesis: two.sided 
95 percent confidence interval:
 -0.02420252  0.18636468 
sample estimates:
prop 1 prop 2 
0.08108108 0. 

Warning message: 
Chi-squared approximation may be incorrect in: prop.test(c(9, 0), c(111, 10))


Do you have suggestions for an alternative test?

many thanks for your help,
+kind regards,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] FDR in p.adjust

2003-11-03 Thread Arne.Muller
Hello,

I've a question about the fdr method in p.adjust: What is the threshold of
the FDR, and is it possible to change this threshold?

As I understand the FDR (please correct) it adjusts the p-values so that for
less than N% (say the cutoff is 25%) of the alternative hypothesis the Null
is in fact true.

thanks a lot for help,
+regards,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] why does data frame subset return vector

2003-10-18 Thread Arne.Muller
Hello,

I've a weired problem with a data frame. Basically it should be just one
column with
specific names coming from a data file (the file contains 2 rows, one should
be
the for the rownames of the data frame the other contains numeric values).

 df.rr - read.table(RR_anova.txt, header=T, comment.char=, row.names=1)
 df.rr[c(1,2,3),]
[1] 1.11e-16 1.11e-16 1.11e-16

Why are the rownames not displayed?

The data file itself look slike this:
 df.rr - read.table(RR_anova.txt, header=T, comment.char=)
 df.rr[c(1,2,3),]
QUAL   PVALUE
1AJ224120_at 1.11e-16
2 rc_AA893000_at 1.11e-16
3 rc_AA946368_at 1.11e-16

and assigning the rownames explicitely works as I'd expect:   
 rownames(df.rr) - df.rr$'QUAL'
 df.rr[c(1,2,3),]
 QUAL   PVALUE
AJ224120_at   AJ224120_at 1.11e-16
rc_AA893000_at rc_AA893000_at 1.11e-16
rc_AA946368_at rc_AA946368_at 1.11e-16

Ok, now they are displayed, but it's a duplication to keep the QUAL colum.

below I create the a new data frame to skip the QUAL column, since it is
already
a rowname.
 df.rr2 - data.frame(PVALUE=df.rr, row.names=1)
 df.rr2[1:4,]
[1] 1.11e-16 1.11e-16 1.11e-16 1.11e-16

However, the rowname is still there ..., you just cannot see it:
 df.rr2[AJ224120_at,]
[1] 1.11e-16

The code below shows that sub-setting the df.rr data frame in deed creates
a
vector rather than a data frame whereas sub-setting the 2 column data frame
returns
a new data frame (as I'd expect).
 
 df.rr[1:4,]
[1] 1.11e-16 1.11e-16 1.11e-16 1.11e-16
 is.vector(df.rr[1:4,])
[1] TRUE
 is.data.frame(df.rr[1:4,])
[1] FALSE
 df.rr - read.table(CLO_RR_anova.txt, header=T, comment.char=)
 is.data.frame(df.rr[1:4,])
[1] TRUE

Any explanation is appreciated. There must be a good reason for this I guess
... . On
the other hand is there a way to fore the subset of the 1 colum data frame to
be
dataframe itself? I'd just like to see the rownames displayed, that's it ...

thanks alot for your help,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] sub data frame by expression

2003-10-17 Thread Arne.Muller
Hi All,

I've the following data frame with 54 rows and 4 colums:

 x  
  Ratio  Dose Time Batch
R.010mM.04h.NEW0.02 010mM  04h   NEW
R.010mM.04h.NEW.1  0.07 010mM  04h   NEW
...
R.010mM.24h.NEW.2  0.06 010mM  24h   NEW
R.010mM.04h.OLD0.19 010mM  04h   OLD
...
R.010mM.04h.OLD.1  0.49 010mM  04h   OLD
R.100mM.24h.OLD0.40 100mM  24h   OLD

I'd like to create a sub data frame containing all rows where Batch == OLD
and keeping the 4 colums. Assume that I don't know the order of the rows
(otherwise I could just do something like x[1:20,]).

I've tried x[x$Batch == 'OLD'] or x[x[,4] == 'OLD'] but it generates errors.
So I assume I've still not realy understood the philosophy of indexing ...
:-(

What's the easiest way to do this, any suggestions?

thanks a lot for you help,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] sub data frame by expression

2003-10-17 Thread Arne.Muller
Sorry, I just figured it out: x[x$Batch == 'OLD',] instead of x[x$Batch ==
'OLD']. I didn't know this has to be in the same format then x[1:20,] where I
already used the comma.

sorry for posting the previous message ...

Arne


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of
 [EMAIL PROTECTED]
 Sent: 17 October 2003 12:12
 To: [EMAIL PROTECTED]
 Subject: [R] sub data frame by expression
 
 
 Hi All,
 
 I've the following data frame with 54 rows and 4 colums:
 
  x  
   Ratio  Dose Time Batch
 R.010mM.04h.NEW0.02 010mM  04h   NEW
 R.010mM.04h.NEW.1  0.07 010mM  04h   NEW
 ...
 R.010mM.24h.NEW.2  0.06 010mM  24h   NEW
 R.010mM.04h.OLD0.19 010mM  04h   OLD
 ...
 R.010mM.04h.OLD.1  0.49 010mM  04h   OLD
 R.100mM.24h.OLD0.40 100mM  24h   OLD
 
 I'd like to create a sub data frame containing all rows where 
 Batch == OLD
 and keeping the 4 colums. Assume that I don't know the order 
 of the rows
 (otherwise I could just do something like x[1:20,]).
 
 I've tried x[x$Batch == 'OLD'] or x[x[,4] == 'OLD'] but it 
 generates errors.
 So I assume I've still not realy understood the philosophy of 
 indexing ...
 :-(
 
 What's the easiest way to do this, any suggestions?
 
   thanks a lot for you help,
 
   Arne
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] sub data frame by expression

2003-10-17 Thread Arne.Muller
Hi,

thanks for your replies regarding the problem to select a sub data frame by
expression. I start getting an understanding on how indexing works in R.

thanks for your replies,

Arne

 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 Sent: 17 October 2003 12:38
 To: Muller, Arne PH/FR
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] sub data frame by expression
 
 
 On Fri, 17 Oct 2003 [EMAIL PROTECTED] wrote:
 
  I've the following data frame with 54 rows and 4 colums:
  
   x  
Ratio  Dose Time Batch
  R.010mM.04h.NEW0.02 010mM  04h   NEW
  R.010mM.04h.NEW.1  0.07 010mM  04h   NEW
  ...
  R.010mM.24h.NEW.2  0.06 010mM  24h   NEW
  R.010mM.04h.OLD0.19 010mM  04h   OLD
  ...
  R.010mM.04h.OLD.1  0.49 010mM  04h   OLD
  R.100mM.24h.OLD0.40 100mM  24h   OLD
  
  I'd like to create a sub data frame containing all rows 
 where Batch == OLD
  and keeping the 4 colums. Assume that I don't know the 
 order of the rows
  (otherwise I could just do something like x[1:20,]).
  
  I've tried x[x$Batch == 'OLD'] or x[x[,4] == 'OLD'] but it 
 generates errors.
 
 That subsets columns, not rows. Try x[x$Batch == OLD,]
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] A data frame of data frames

2003-10-16 Thread Arne.Muller
Hello,

I'm trying to set up the flowwing data structure in R:

A data frame with 7,000 rows and 4 colums. The rownames have some special
meaning (they are names of genes). The 1st column per row is itself a data
frame, and columns 2 to 4 will keep numeric values.

The data frame contained in the 1st column will have 54 rows (with special
names) and 4 colums (1st col is a response, cols 2- 4 are factors). Each of
these data frames with the response/factors will be fed into an 3way linear
model for anova. The other colums of the 1st data will hold the p-values.

Basically running 7,000 anovas is very quick but the reformating of the data
so that it is suitable for the anova takes a long time (45 minutes). So I'd
just like to keep the generated data structure as a persistent R object.

I haven't managed to store the 2nd data frame in the 1st colum of the 1st
data frame.

From other languages such as C I'd know how to setup this kind of data
structure (pointers), but I get stuck in R (I guess I'm still struggling the
way the R philosophy on how to present data structures).

Do you've any suggestions on how to do this?

kind regards,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] New R - recompiling all packages

2003-10-08 Thread Arne.Muller
Hi All,

I'm running R 1.7.1, and I've installed some additional packages such a
Bioconductor. Do I've to re-install all the additional packages when ugrading
to R 1.8.0 (i.e. are there compile in dependencies)?

thanks for your help,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] updating via CRAN and http

2003-10-08 Thread Arne.Muller
Hello,

thanks for the tips on updating packages for 1.8.0. The updating is a real
problem for me, since I've to do it sort of manually using my web-browser or
wget. I'm behind a firewall that requires http/ftp authentification (username
and passwd) for every request it sends to a server outside our intranet.
Therefore all the nice tools for automatic updating (cran, cpan ...) don't
for me (I've tried).

I understand that the non-paranoid rest of the world can't be bothered, but
is there any intenstion to include such authentification into the update
procedures of R? I think for ftp it's kind of tricky, but at least for http
the authentification seems to be straight forward.

kind regards,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] updating via CRAN and http

2003-10-08 Thread Arne.Muller
Sorry, I didn' mean it the nasty way. I wouldn't have been surprised if the
R-team had told me the authentification with the firewall is my problem (i.e.
a special case that cannot be dealt with by th R-team). 

Yess, and off course I should have had a much closer lookk into the docu.
Thanks again for the hint + please forgive!

+regards,

Arne

 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
 Sent: 08 October 2003 17:20
 To: Muller, Arne PH/FR
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] updating via CRAN and http
 
 
 On Wed, 8 Oct 2003 [EMAIL PROTECTED] wrote:
 
  Hello,
  
  thanks for the tips on updating packages for 1.8.0. The 
 updating is a real
  problem for me, since I've to do it sort of manually using 
 my web-browser or
  wget. I'm behind a firewall that requires http/ftp 
 authentification (username
  and passwd) for every request it sends to a server outside 
 our intranet.
  Therefore all the nice tools for automatic updating (cran, 
 cpan ...) don't
  for me (I've tried).
  
  I understand that the non-paranoid rest of the world can't 
 be bothered, but
  is there any intenstion to include such authentification 
 into the update
  procedures of R? I think for ftp it's kind of tricky, but 
 at least for http
  the authentification seems to be straight forward.
 
 It's available for http: see ?download.file, and you can even 
 configure 
 that to use wget.
 
 Your comments are very much misplaced: we *have* bothered to 
 provide the 
 facilities for you.
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Jonckheere-Terpstra test

2003-10-05 Thread Arne.Muller
Hello,

can anybody here explain what a Jonckheere-Terpstra test is and whether it is
implemented in R? I just know it's a non-parametric test, otherwise I've no
clue about it ;-( . Are there alternatives to this test?

thanks for help,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] multi-dimensional hash

2003-10-02 Thread Arne.Muller
Hello,

I was wondering what's the best data structure in R for a multi-dimensional
lookup table, and how to implement it. I've several categories say A, B,
C ... and within each of these categories there are other categories such
as a, b, c, ... . There can be up to 5 dimensions. The actual value for
[A][a]... is then a vector.

I'm looking forward to any suggestions,
+thanks very much for your help,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] R book

2003-09-11 Thread Arne.Muller
Hi All,

I'd be interested in your opinions of the book

Introductory Statistics with R by Peter Dalgaard 

Does it well describe the R object concept, the language itself and
statistical aspects (I am not a statistician)?

thanks for your opinion,

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] No joy installing R with shared libs

2003-09-09 Thread Arne.Muller
Hi,

I've experienced similar failures with the RSperl installation. So I'd be
interested if someone sorts out the library misery ... ;-)

Arne

 -Original Message-
 From: Laurent Faisnel [mailto:[EMAIL PROTECTED]
 Sent: 09 September 2003 12:48
 To: [EMAIL PROTECTED]
 Subject: Re: [R] No joy installing R with shared libs
 
 
   Can some kind soul please give me a fool proof recipe for 
 building R 
   and RSPython so that it actually works?
 
 
   I don't have a recipe, but one thought to help debug the 
 process:  Try
   installing RPy [1].  RPy also provides access to R via 
 Python and uses
   the libR.so library.  If you can install and import rpy without
   problem then it must be an issue with RSPython.
 
 Hi,
 
 I had problems of the same kind recently and finally gave up.
 I tried to install Rpy without success, errors with 
 undetected libraries 
 occured while I was making the import rpy from python 
 (especially with 
 libblas).
 Since I was not sure R was correctly configured I downloaded 
 the latest 
 version R-1.7.1 and tried to install it with R-enable-shared 
 option. I 
 could not get out of numerous errors.
 Please tell me whether the problem you had calling RSPython is solved 
 after installing RPy (if it was possible to install it).
 Good luck.
 
 Laurent
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] all values from a data frame

2003-09-05 Thread Arne.Muller
Hello,

I've a data frame with 15 colums and 6000 rows, and I need the data in a
single vector of size 9 for ttest. Is there such a conversion function in
R, or would I have to write my own loop over the colums?

thanks for your help + kind regards

Arne

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help