from:"Christoph Lehmann"

Re: [R] (no subject)

2006-01-12 Thread Christoph Lehmann

type ?par and then have a look at:
cex.lab, cex.main

cheers
christoph
[EMAIL PROTECTED] wrote:
 Dear ladies and gentlemen!
 When I use the plot funtion how can I change the size of the title for the x 
 and
 y axes (xlab, ylab)and the size of the axes label ?
 
 Thank you very much.
 
 With best regards
 
 Claudia
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Getting eps into Word documents.

2005-10-03 Thread Christoph Lehmann

as far as I know (and I am not a specialist), there is no way of 
importing EPS in word, nor in openoffice, if you want to be able to 
see the graphics in the word processor and not only a placeholder. So 
either
a) use formats such as bmp, png,
or
b) under windows: convert eps to e.g. wmf using a tool, such as corel 
draw, or import the eps into powerpoint, then save it as wmf
c) try to import the eps in inkscape (I think this opensource tool is 
available also for windows), but I have never tried this

.. I have to admit, that under linux I have found no way so far, to 
import an eps graphics into a word document (neither ooo writer, nor 
msword, running with wine).

Christoph

tom wright wrote:
 On Mon, 2005-03-10 at 16:31 -0300, Rolf Turner wrote:
 
A student in one of my courses has asked me about getting R graphics
output (under Linux) into a Word document.  I.e. she wants to do her
R thing under Linux, but then do her word processing using Word.

Scanning around the r-help archives I encountered an inquiry about
this topic --- eps into Word documents --- from Paul Johnson but
found no replies to it.  I tried contacting him but the email address
in the archives appeared not to be valid.  Does anyone know a
satisfactory solution to the problem of including a graphic which
exists in the form of a *.eps (encapsulated postscript) file into a
Word document.  If so, would you be willing to share it with me and
my student?

If so, please be gentle in your explanation.  I am not myself (repeat
***NOT***) a user of Word!

Thanks.

  cheers,

  Rolf Turner
  [EMAIL PROTECTED]
 
 
 R can also create more generic image formats such as png, i just use
 these when I'm forced to insert graphics for presentations
 ?png
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Creating an array [0 to 101] of a list

2005-09-30 Thread Christoph Lehmann

a - array(vector(list, 3), dim=c(3))
a[[1]] - list(x = 1, y = 0, z = -1)
a[[2]] - list(x = 0, y = 1, z = -1)
a[[3]] - list(x = 0, y = -1, z = 0)

HTH
christoph
Rainer M. Krug wrote:
 Hi
 
 I looked, but I didn't find it:
 
 I need an array [0 to 101] where each element is a list (the result of 
 Kest in spatstat).
 I.e. I want to do:
 
 A[2] - Kest(pp1)
 A[3] - Kest(pp2)
 ...
 A[101] - Kest(pp100)
 
 How can I create A ?
 
 Rainer


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] plot.augPred sorted and labelled according second factor

2005-09-29 Thread Christoph Lehmann

Hi

using this code example:

library(nlme)
fm1 - lme(Orthodont, random = ~1)
plot(augPred(fm1))

is there any way to have the plots in each cell labelled and ordered
according to Orthodont$Sex? I.e. in addition to the bar with the label for
Orthodont$Subject there is another bar labelling the Sex of the subject?

thanks a lot

christoph

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] savePlot(type=wmf) in loop fails, since too fast

2005-09-16 Thread Christoph Lehmann

hi
working with R-2.1.1 on winxp, in a loop I draw to a trellis.device which
takes some time. After the drawing I call savePlot().
it seems, the loop is too fast for the savePlot() call to finish. Is there
any solution for such a problem? Calling the same steps outside the loop,
works fine.
many thanks

christoph 

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lme model: Error in MEEM

2005-08-18 Thread Christoph Lehmann

Hi,
We have data of two groups of subjects: 32 elderly, 14 young adults. for
each subject we have 15 observations, each observation consisting of a
reaction-time measure (RT) and an activation maesure (betadlpcv). 
since we want to analyze the influence of (age-)group and RT on the
activation, we call: 

lme(betadlpcv ~ RT*group, data=our.data, random=~ RT |subject)

this yields:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
In addition: Warning message:
Fewer observations than random effects in all level 1 groups in:
lme.formula(betadlpcv ~ RT * group, data = patrizia.data, random = ~RT |

what's the problem here?

thanks for your kind help
christoph

-- 
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lme model: Error in MEEM

2005-08-18 Thread Christoph Lehmann

sorry, RT had an error in raw data and was treated as a factor. after
correction of the raw data (RT is numeric) now it works fine.

thanks a lot
christoph
 --- Ursprüngliche Nachricht ---
 Von: Douglas Bates [EMAIL PROTECTED]
 An: Christoph Lehmann [EMAIL PROTECTED]
 Kopie: r-help@stat.math.ethz.ch
 Betreff: Re: [R] lme model: Error in MEEM
 Datum: Thu, 18 Aug 2005 08:29:52 -0500
 
 On 8/18/05, Christoph Lehmann [EMAIL PROTECTED] wrote:
  Hi,
  We have data of two groups of subjects: 32 elderly, 14 young adults. for
  each subject we have 15 observations, each observation consisting of a
  reaction-time measure (RT) and an activation maesure (betadlpcv).
  since we want to analyze the influence of (age-)group and RT on the
  activation, we call:
  
  lme(betadlpcv ~ RT*group, data=our.data, random=~ RT |subject)
  
  this yields:
  Error in MEEM(object, conLin, control$niterEM) :
  Singularity in backsolve at level 0, block 1
  In addition: Warning message:
  Fewer observations than random effects in all level 1 groups in:
  lme.formula(betadlpcv ~ RT * group, data = patrizia.data, random = ~RT |
  
  what's the problem here?
 
 It seems that you only have one observation per subject and you are
 trying to estimate a model with two random effects per subject plus
 the per-observation noise term.  These terms are completely
 confounded.
  
  thanks for your kind help
  christoph
  
  --
  Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
  Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
  
  
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
  
 
 

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] nonparametric 2way repeated-measures anova

2005-06-28 Thread Christoph Lehmann

Dear useRs
is there any nonparametric test for the analysis of variance in a design
with two within-factors (repeated measures on both factors)? Friedman is not
appropriate here, therefore I am grateful for any alternative test.

thanks for any hint
cheers
christoph

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] studentized CIs for a correlation using package boot

2005-05-21 Thread Christoph Lehmann

Dear useRs
I need to compute studentized confidence intervals for a correlation,
using the boot library.

For this CIs we need to compute a variance estimate of the statistic
(here correlation coeff) from each boostrap sample. There are 2
important points, I think:
(1) We need to do a fisher transformation (atanh(x)) to correct for
non-normality, this can be done easily be specifying h, hinv, and hdot
parameteres in the boot.ci call.
(2) an estimate for the variance is (as far as I remember)
1-correlation2)2/n (For fisher transformed data, an estimator is: 1/(n-3))

do you think, this is the correct way:

library(boot)
fisher - function(r) 0.5*log((1+r)/(1-r))
fisher.dot - function(r) 1/(1-r2)
fisher.inv - function(z) (exp(2*z)-1)/(exp(2*z)+1)

boot.fun - function(data, i) {
  n - length(i)
  correlation - cor(data[i,1],data[i,2])
  v - (1-correlation2)2/n
  c(correlation, v)
}

td.boot - boot(td, boot.fun, R=)
boot.ci(td.boot, h = fisher, hdot = fisher.dot,
hinv = fisher.inv, conf = c(0.95))

?. many thanks for your thoughts

cheers
christoph






-- 
Weitersagen: GMX DSL-Flatrates mit Tempo-Garantie!
Ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Change class factor to numeric

2005-05-06 Thread Christoph Lehmann

?as.factor states:
To revert a factor 'f' to its original
 numeric values, 'as.numeric(levels(f))[f]' is recommended and
 slightly more efficient than 'as.numeric(as.character(f))
christoph
Petr Pikal wrote:
Hi
On 6 May 2005 at 8:51, Paulo Justiniano Ribeiro Jr wrote:
as.numeric()
Not exactly correct.
as.numeric(as.character())
gives you what you probably want ,if mass is really factor ;)
see
str(mass)
Cheers
Petr


On Fri, 6 May 2005, Smit, R. (Robin) (IenT) wrote:
I am attempting to develop a multiple regression model using
selected model variables that should all be treated as numeric
(mostly real) values. However, R considers one specific variable
mass automatically to be of class factor, probably because
mass consists of integer values that are repeated. I now want to
force R to treat mass as a numeric variable in the regression but
am not sure how to do this.
class(mass) - numeric
does not help me.
Could anyone advise me on this?
Kind regards,
Robin Smit

TNO Science  Technology
Business Unit Automotive
Environmental Studies  Testing
PO Box 6033, 2600 JA Delft
THE NETHERLANDS
ph. +31 (0)15 269 7464
fax +31 (0)15 269 6874
[EMAIL PROTECTED]
http://www.tno.nl/industrie_en_techniek/mobiliteit_en_(transport)/mi
lieu studies/environmental_studies_and/
http://www.tno.nl/industrie_en_techniek/mobiliteit_en_(transport)/m
ilie ustudies/environmental_studies_and/



This e-mail and its contents are subject to the DISCLAIMER at
http://www.tno.nl/disclaimer/email.html [[alternative HTML version
deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Paulo Justiniano Ribeiro Jr
LEG (Laboratrio de Estatstica e Geoinformao)
Departamento de Estatstica
Universidade Federal do Paran
Caixa Postal 19.081
CEP 81.531-990
Curitiba, PR  -  Brasil
Tel: (+55) 41 361 3573
Fax: (+55) 41 361 3141
e-mail: [EMAIL PROTECTED]
http://www.est.ufpr.br/~paulojus
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
Petr Pikal
[EMAIL PROTECTED]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help with R

2005-05-05 Thread Christoph Lehmann

I
heard that 'R' does not do a very good job at handling large datasets, is
this true? 

importing huge datasets in a data.frame with e.g. a subsequent step of 
conversion of some columns into factors may lead into memory troubles 
(probably due to memory overhead when building out factors). But we 
currently succeeded in  importing 12 millions of data records stored in 
a MySQL database, using RMySQL package. The procedure which lead to 
success was:

0 define a data.frame 'data.total' with the size necessary to keep the 
whole data set to be imported
in a loop do:
  1 import the data in chunks of eg 3 records per chunk and save it 
in a temporary data.frame 'data.chunk'
  2 the conversion into factors and other preprocessing steps, such as 
data aggregation should be done for each single chunk saved in 
'data.chunk' after import
  3 the now preprocessed chunk is saved into the appropriate part of 
the at the beginning defined data.frame 'data.total'

4 whole dataset is imported and data.frame 'data.total' is ready for 
further computational steps

in a nutshell: preprocessing steps such as conversion into factors yield 
memory troubles, even for data.sets which per se don't take too much 
memory- but done separately in smaller chunks of data, it can be done 
with R very efficiently. The 'team' MySQL together with R is VERY powerful

Cheers
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] selections of data by one variable

2005-05-04 Thread Christoph Lehmann

test - data.frame(cbind(1:10,11:20))
names(test) - c(a, b)
test[test$b == 17,]
test[test$b %in% c(13, 15, 17),]


Tu Yu-Kang wrote:
 Dear R experts,
 
 My problem is as follows:
 
 Suppose I have a data frame d comprising two variable a-c(1:10)  
 b-c(11:20).
 
 I now want to select a subgroup according the values of b.
 
 I know if I just want to select, say, b=17, I can use f-d[d$b==17] and 
 R will give me
 f
  a  b
 7 7 17
 
 However, if now I want to select a subgroup according to 
 b==e-c(13,15,17), then the same syntx doesn't work.
 
 What is the correct way to do it?  My data have more than one million 
 subjects, and I want to select part of them according to their id numbers.
 
 Your help will be highly appreciated.
 
 Best regards,
 
 Yu-Kang
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] RMySQL installation: libz missing

2005-05-03 Thread Christoph Lehmann

Hi
I run suse linux 9.1 and I installed MySQL server, client, devel, bench.
DBI is installed, when I try to install RMySQL I get an error saying, 
that libz is missing.

(paths to libs were set:export PKG_CPPFLAGS=-I/usr/include/mysql/
export PKG_LIBS=-L/usr/lib/mysql/ -lmysqlclient)
so my question: where do I get the libz files (are these mysql files? if 
yes, why were they not installed at least by mysql-devel?)

thanks for your kind help
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] RMySQL installation: libz missing SOLVED

2005-05-03 Thread Christoph Lehmann

it seemed to be a problem with the rpm for suse 9.1.. I installed and 
compiled R2.1 using the sources, then installation of RMySQL succeeded

mmmh ..
Christoph
Christoph Lehmann wrote:
Hi
I run suse linux 9.1 and I installed MySQL server, client, devel, bench.
DBI is installed, when I try to install RMySQL I get an error saying, 
that libz is missing.

(paths to libs were set:export PKG_CPPFLAGS=-I/usr/include/mysql/
export PKG_LIBS=-L/usr/lib/mysql/ -lmysqlclient)
so my question: where do I get the libz files (are these mysql files? if 
yes, why were they not installed at least by mysql-devel?)

thanks for your kind help
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] RMySQL query: why result takes so much memory in R ?

2005-05-02 Thread Christoph Lehmann

Hi
I just started with RMySQL. I have a database with roughly 12 millions 
rows/records and 8 columns/fields.

From all 12 millions of records I want to import 3 fields only.
The fields are specified as:id int(11), group char(15), measurement 
float(4,2).
Why does this take  1G RAM? I run R on suse linux, with 1G RAM and with 
the code below it even fills the whole 1G of swap. I just don't 
understand how 12e6 * 3 can fill such a huge range of RAM? Thanks for 
clarification and potential solutions.

## my code
library(RMySQL)
drv - dbDriver(MySQL)
ch - dbConnect(drv,dbname=testdb,
user=root,password=mysql)
testdb - dbGetQuery(ch,
   select id, group, measurement from mydata)
dbDisconnect(ch)
dbUnloadDriver(drv)
## end of my code
Cheers
Christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] summary(as.factor(x) - force to not sort the result according factor levels

2005-05-02 Thread Christoph Lehmann

Hi
The result of a summary(as.factor(x)) (see example below) call is sorted 
according to the factor level. How can I get the result not sorted but 
in the original order of the levels in x?

 test - c(120402, 120402, 120402, 1323, 1323,200393, 200393, 200393, 
200393, 200393)
 summary(as.factor(test))
  1323 120402 200393
 2  3  5

I need:
120402 1323 200393
 32  5
thanks for a hint
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] large dataset import, aggregation and reshape

2005-04-24 Thread Christoph Lehmann

Dear useRs
We have a data-set (comma delimited) with 12Millions of rows, and 5 
columns (in fact many more, but we need only 4 of them): id, factor 'a' 
(5 levels), factor 'b' (15 levels), date-stamp, numeric measurement. We 
run R on suse-linux 9.1 with 2GB RAM, (and a 3.5GB swap file).

on average we have 30 obs. per id. We want to aggregate (eg. sum of the 
measuresments under each factor-level of 'a' and the same for factor 
'b') and reshape the data so that for each id we have only one row in 
the final data.frame, means finally we have roughly 40 lines.

I tried read.delim, used the nrows argument, defined colClasses (with an 
as.Date class) - memory problems at the latests when calling reshape and 
aggregate. Also importing the date column as character and then 
converting the dates column using 'as.Date' didn't succeed.

It seems the problematic, memory intesive parts are:
a) importing the huge data per se (but the data with dim c(12,5)  2GB?)
b) converting the time-stamp to a 'Date' class
c) aggregate and reshape task
What are the steps you would recommend?
(i) using scan, instead of read.delim (with or without colClasses?)
(ii) importing blocks of data (eg 1Million lines once), aggregating 
them, importing the next block, so on?
(iii) putting the data into a MySQL database, importing from there and 
doing the reshape and aggregation in R for both factors separately

thanks for hints from your valuable experience
cheers
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Bootstrap / permutation textbooks

2005-04-24 Thread Christoph Lehmann

look at:
AC Davison, DV Hinkley: Bootstrap Methods and Their Applications
there is also a R-library 'boot', based on methods reported in this book
C
Peter Soros wrote:
Dear R experts,
I would like to explore if and to what extent bootstrapping and 
permutation statistics can help me for my research (functional brain 
imaging). I am looking for an introductory textbook, rather legible. I 
have statistical knowledge, but I am definitely no statistical or 
mathematical guru.
Do you have suggestions for a useful textbook?

Thanks a lot,
Peter
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lsfit result - how to compute t-values for coefficients

2005-04-22 Thread Christoph Lehmann

Hi
I used lsfit instead of lm since I have a huge Y data-set (X being 
constant for all Y).

Since I require the t-values for all coefficients: which would be the 
fastest way to compute them, eg for the example:

## using lsfit with a matrix response:
t.length - 5
d.dim - c(t.length,7,8,9) # dimesions: time, x, y, z
Y - array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), 
d.dim)
X - cbind(c(1,3,2,4,5), c(1,1,1,5,5))

date()
rsq -lsfit(X, array(c(Y), dim = c(t.length, 
prod(d.dim[2:4]$coef[2,] #coef for first non-const pred
names(rsq) - prod(d.dim[2:4])
rsq - array(rsq, dim = d.dim[2:4])
date()

what would be the best way to get the t-value for all coef,
not only (as above illustrated for the beta value) for one predefined coef?
##-
many thanks
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] apply vs sapply vs loop - lm() call appl(y)ied on array

2005-04-21 Thread Christoph Lehmann

Dear useRs
(Code of the now mentioned small example is below)
I have 7 * 8 * 9 = 504 series of data (each length 5). For each of 
theses series I want to compute a lm(), where the designmatrx X is the 
same for all these computations.

The 504 series are in an array of dimension d.dim - c(5, 7, 8, 9)
means, the first dimension holds the data-series.
The lm computation needs performance optimization, since in fact the 
dimensions are much larger. I compared the following approaches:

using a for-loop. using apply, and using sapply. All of these require 
roughly the same time of computation. I was astonished since I expected 
at least sapply to outperfomr the for-loop.

Do you have me another solution, which is faster? many thanks
here is the code
## --
t.length - 5
d.dim - c(t.length,7,8,9) # dimesions: time, x, y, z
Y - array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), 
d.dim)
X - c(1,3,2,4,5)

##  performance tests
## using for loop
date()
z - rep(0, prod(d.dim[2:4]))
l - 0
for (i in 1:dim(Y)[4])
 for (j in 1:dim(Y)[3])
  for (k in 1:dim(Y)[2]) {
l - l + 1
z[l] - unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared
  }
date()
## using apply
date()
z - apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared)
date()
## using sapply
date()
fac - rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4])))
z - sapply(split(as.vector(Y), fac), FUN = function(x) 
unlist(summary(lm(x ~ X)))$r.squared)
dim(z) - d.dim[2:4]
date()

## --
--
Christoph LehmannPhone:  ++41 31 930 93 83
Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00
University Hospital of Clinical Psychiatry   Fax:++41 31 930 99 61
Waldau[EMAIL PROTECTED]
CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_04.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lda (MASS)

2005-04-21 Thread Christoph Lehmann

Now, I use my real dataset (900 instances, 21 attributes), which 2 classes
can be serparated with accuracy no more than 80% (10xval) with KNN, SVM, C4.5
and the like. 
I thinks these accuracies are based on cross-validation runs. Whereas
the 80% accuracy you report using LDA is not based on cross-validation
runs as long as CV is not set to TRUE.
PS: and does anybody know how to use the CV option of lda to make xval?
I can't get it.
z - lda(Sp ~ ., Iris, CV = TRUE)
table(Iris$Sp, z$class)
cheers
christoph


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] apply vs sapply vs loop - lm() call appl(y)ied on array

2005-04-21 Thread Christoph Lehmann

Ok thanks to a hint of Matthew to a former post with a similar request I 
have now three faster solutions (see below), the last one being the 
fastest, but the former two also faster than the for-loop, 
apply(lm(formula)) and sapply(lm(formula)) versions in my last mail:

one problem only: using lsfit I can't get directly measures such as 
r.squared ...

---
## using lm with a matrix response (recommended by BDR)
date()
rsq -unlist(summary(lm(array(c(Y), dim = c(t.length, prod(d.dim[2:4]))) 
~ X)))[seq(22, prod(d.dim[2:4]) * 30, by = 30)] #get r.squared list-element
names(rsq) - prod(d.dim[2:4])
rsq - array(rsq, dim = d.dim[2:4])
date()

## using sapply and lsfit instead of lm (recommended by Kevin Wright)
date()
fac - rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4])))
z - sapply(split(as.vector(Y), fac), FUN = function(x) lsfit(X, x)$coef[2])
dim(z) - d.dim[2:4]
date()
## using lsfit with a matrix response:
date()
rsq -lsfit(X, array(c(Y), dim = c(t.length, prod(d.dim[2:4]$coef[2,]
names(rsq) - prod(d.dim[2:4])
rsq - array(rsq, dim = d.dim[2:4])
date()
--
thanks
Christoph
Wiener, Matthew wrote:
Christoph --
There was just a thread on this earlier this week.  You can search in the
archives for the title:   refitting lm() with same x, different y.
(Actually, it doesn't turn up in the R site search yet, at least for me.
But if you just go to the archive of recent messages, available through
CRAN, you can search on refitting and find it.  The original post was from
William Valdar, on April 19.)
Hope this helps,
Matt Wiener
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christoph Lehmann
Sent: Thursday, April 21, 2005 9:24 AM
To: R-help@stat.math.ethz.ch
Subject: [R] apply vs sapply vs loop - lm() call appl(y)ied on array
Dear useRs
(Code of the now mentioned small example is below)
I have 7 * 8 * 9 = 504 series of data (each length 5). For each of 
theses series I want to compute a lm(), where the designmatrx X is the 
same for all these computations.

The 504 series are in an array of dimension d.dim - c(5, 7, 8, 9)
means, the first dimension holds the data-series.
The lm computation needs performance optimization, since in fact the 
dimensions are much larger. I compared the following approaches:

using a for-loop. using apply, and using sapply. All of these require 
roughly the same time of computation. I was astonished since I expected 
at least sapply to outperfomr the for-loop.

Do you have me another solution, which is faster? many thanks
here is the code
## --
t.length - 5
d.dim - c(t.length,7,8,9) # dimesions: time, x, y, z
Y - array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), 
d.dim)
X - c(1,3,2,4,5)

##  performance tests
## using for loop
date()
z - rep(0, prod(d.dim[2:4]))
l - 0
for (i in 1:dim(Y)[4])
  for (j in 1:dim(Y)[3])
   for (k in 1:dim(Y)[2]) {
 l - l + 1
 z[l] - unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared
   }
date()
## using apply
date()
z - apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared)
date()
## using sapply
date()
fac - rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4])))
z - sapply(split(as.vector(Y), fac), FUN = function(x) 
unlist(summary(lm(x ~ X)))$r.squared)
dim(z) - d.dim[2:4]
date()

## --
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] basic question

2005-04-21 Thread Christoph Lehmann

sapply(split(test, test$year), function(x) list(x.s = sum(x$x), y.s = 
sum(x$y), z.s = sum(x$z)))

or for one variable only
aggregate(test$x, list(id = test$year), sum)
cheers
christoph
jose silva wrote:
I know this question is very simple, but I am not figure it out
 

I have the data frame:
 

 

 

test- data.frame(year=c(2000,2000,2001,2001),x=c(54,41,90,15), 
y=c(29,2,92,22), z=c(26,68,46,51))
 

test
 

  yearx   y   z
 

1 2000 54 29 26
 

2 2000 41  2  68
 

3 2001 90 92 46
 

4 2001 15 22 51
 

 

 

I want to sum the vectors x, y and z within each year (2000 and 2001) to obtain 
this:
 

 

 

year   x   yz
 

1 2000  95  31 94
 

2 2001 105 114 97
 

 

 

I tried tapply but did not work (or probably I do it wrong)
 

 

 

Any suggestions?  

 

 

 

silva
[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] indexing an array using an index-array, but one entry being ', '

2005-04-19 Thread Christoph Lehmann

Hi
I have the following array:
test - array(c(1:16), dim = c(3,4,3))
test
## I call some enries using an index array
test.ind - array(rbind(c(1,2,1), c(3,3,2)), dim = c(2,3))
test[test.ind]
## suppose I want all values in the 2nd row and 4th col over
## all three 3rd dimensions
test[2,4,]
how to specify a test.ind array with the last index left with ',' i.e
test.ind should be evaluated as 2, 4, ,  so that it can be
calledlike above as test[test.ind] and the
result should be [1] 11  7  3
thanks for a hint
Cheers
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] indexing an array using an index-array, but one entry being ', '

2005-04-19 Thread Christoph Lehmann

OK, the hint by Dimitris applied I just do very simple:
test - array(c(1:16), dim = c(3,4,3))
test
## I call some enries using an index array
test.ind - array(rbind(c(1,2,1), c(3,3,2)), dim = c(2,3))
test[test.ind]
## suppose I want all values in the 2nd row and 4th col over
## all three 3rd dimensions
test[2,4,]
## using an index array
nn - dim(test)[3]
voxel.ind - c(2, 4)
test.ind - array(cbind(rep(voxel.ind[1], nn), rep(voxel.ind[2], nn), 
1:nn), dim = c(nn, 3))

test[test.ind]
cheers
christoph
Christoph Lehmann wrote:
Hi
I have the following array:
test - array(c(1:16), dim = c(3,4,3))
test
## I call some enries using an index array
test.ind - array(rbind(c(1,2,1), c(3,3,2)), dim = c(2,3))
test[test.ind]
## suppose I want all values in the 2nd row and 4th col over
## all three 3rd dimensions
test[2,4,]
how to specify a test.ind array with the last index left with ',' i.e
test.ind should be evaluated as 2, 4, ,  so that it can be
calledlike above as test[test.ind] and the
result should be [1] 11  7  3
thanks for a hint
Cheers
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] select cases

2005-04-19 Thread Christoph Lehmann

subset(your.data.frame, your.data.frame$sex == 'male')
cheers
c
Faouzi LYAZRHI wrote:
Hi,
I would like to select a few cases (for example cases corresponding to 
sex=male) to make summary for another variable.
How can I do this.
Thanks for your help
Fawtzy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Package 'R2HTML'

2005-04-19 Thread Christoph Lehmann

try this litte example:
save the code below in a file test.Rnw and call then
Sweave(test.Rnw, driver = RweaveHTML())
--
html
body
h1testing r2html/h1
plook at this: here you can write some text/p
font color=darkredbSexpr format(Sys.time(),%Y)/b/font.
echo=FALSE=
summary(data.frame(c(1,2,3), c(3,4,5)))
@
pinsert some graphics/p
echo=FALSE,fig=TRUE,border=1,width=900,height=500,HTMLwidth=900,HTMLheight=500=
print(plot(c(1:30), c(1:30)))
@
/body
/html
--
hope it helps
let me know
christoph
Singh, Avneet wrote:
I recently learnt how to use Sweave which is a wonderful tool
After which i also tried to use R2HTML as it would allow many of my
colleagues who dont use latex to be able to use and edit my work. I was
unable to make it work and couldnt find a way to implement it. I got some
errors.
I wonder if you could help me with it
i have a windows 2000 OS and the version of R is R version 1.9.1,
2004-06-21
This is the command i gave followed by the output, i couldnt find much info
on SweaveParseOptions
Sweave(Sweave-test-1.Rnw,driver=RweaveHTML)
Writing to file Sweave-test-1.html
Processing code chunks ...
Error: couldn't find function SweaveParseOptions 

I have no data yet. It is a capital mistake to theorize before one has
data. Insensibly one begins to twist facts to suit theories instead of
theories to suit facts.
~ Sir Arthur Conan Doyle (1859-1930), Sherlock Holmes
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] colClasses = Date in read.delim, how to pass date-format?

2005-04-18 Thread Christoph Lehmann

Hi
I have a huge data-set with one column being of type date.
Of course I can import the data using this column as factor and then 
convert it later to dates, using:

sws.bezuege$FaktDat - dates(as.character(sws.bezuege$FaktDat),
 format = c(dates = d.m.y))
But the conversion requires a huge amount of memory (and time), 
therefore I would like to use colClasses = c(Date). My question is:
since I have format  = c(dates = d.m.y), how can I pass this option to 
read.delim(..., colClasses = c(Date)) ?

thanks for a hint
cheers
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] colClasses = Date in read.delim, how to pass date-format?

2005-04-18 Thread Christoph Lehmann

so what do you recommend: I just need to be able to sort a data.frame
according to the date entry, and e.g. compute differences between subsequent
dates. Shall I stay with dates (thanks for the hint about confusion of Date
and dates) or is there a better way for this kind of task?

thanks a lot
Cheers
Christoph
 You are confusing class Date (part of R) with class dates (part of 
 package chron).  There is no as() method for class dates, so you can't 
 do this.  You can read the column as character (not factor) and convert 
 later, but it sounds like the `huge amount of memory (and time)' is in 
 fact taken by package chron.
 
 On Mon, 18 Apr 2005, Christoph Lehmann wrote:
 
  I have a huge data-set with one column being of type date.
  Of course I can import the data using this column as factor and then 
  convert it later to dates, using:
 
  sws.bezuege$FaktDat - dates(as.character(sws.bezuege$FaktDat),
  format = c(dates = d.m.y))
 
 
  But the conversion requires a huge amount of memory (and time),
 therefore I 
  would like to use colClasses = c(Date). My question is:
  since I have format  = c(dates = d.m.y), how can I pass this option to
  read.delim(..., colClasses = c(Date)) ?
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] read.delim: only first column import

2005-04-15 Thread Christoph Lehmann

Hi
if I use read.delim, I can specify how many lines I want to import.
Is there also a way to specify that, e.g. I want only the first column 
field of each line to have imported?

thanks for a hint
cheers
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] aggregation question

2005-04-15 Thread Christoph Lehmann

Hi I have a question concerning aggregation
(simple demo code S. below)
I have the data.frame
   idmeas date
1   a 0.6375137471
2   a 0.1877100632
3   a 0.2470984592
4   a 0.3064476903
5   b 0.4075735772
6   b 0.7832550852
7   b 0.3442650823
8   b 0.1038930683
9   c 0.7386495861
10  c 0.6141540372
11  c 0.9499243713
12  c 0.0081878584
When I want for each id the sum of its meas I do:
aggregate(data$meas, list(id = data$id), sum)
If I want to know the number of meas(ures) for each id I do, eg
aggregate(data$meas, list(id = data$id), length)
NOW: Is there a way to compute the number of meas(ures) for each id with 
not identical date (e.g using diff()?
so that I get eg:

  id x
1  a 3
2  b 2
3  c 4
I am sure it must be possible
thanks for any (even short) hint
cheers
Christoph

--
data - data.frame(c(rep(a, 4), rep(b, 4), rep(c, 4)),
   runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4))
names(data) - c(id, meas, date)
m - aggregate(data$meas, list(id = data$id), sum)
names(m) - c(id, cum.meas)
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] aggregation question

2005-04-15 Thread Christoph Lehmann

Dear Sundar, dear Andy
manyt thanks for the length(unique(x)) hint. It solves of course my 
problem in a very elegant way. Just of curiosity (or for potential future 
problems): how could I solve it in a way, conceptually different, namely, 
that the computation on 'meas' being dependent on the variable 'date'?, 
means the computation on a variable x in the function passed to aggregate 
is conditional on the value of another variable y? I hope you understand 
what I mean, let's think of an example:

E.g for the example data.frame below, the sum shall be taken over the 
variable meas only for all entries with a corresponding 'data' != 2

for this do I have to nest two aggregate statements, or is there a way 
using sapply or similar apply-based commands?

thanks a lot for your kind help.

Cheers!

Christoph

aggregate(data$meas, list(id = data$id), sum)
 
 
 Christoph Lehmann wrote on 4/15/2005 9:51 AM:
  Hi I have a question concerning aggregation
  
  (simple demo code S. below)
  
  I have the data.frame
  
 idmeas date
  1   a 0.6375137471
  2   a 0.1877100632
  3   a 0.2470984592
  4   a 0.3064476903
  5   b 0.4075735772
  6   b 0.7832550852
  7   b 0.3442650823
  8   b 0.1038930683
  9   c 0.7386495861
  10  c 0.6141540372
  11  c 0.9499243713
  12  c 0.0081878584
  
  When I want for each id the sum of its meas I do:
  
  aggregate(data$meas, list(id = data$id), sum)
  
  If I want to know the number of meas(ures) for each id I do, eg
  
  aggregate(data$meas, list(id = data$id), length)
  
  NOW: Is there a way to compute the number of meas(ures) for each id 
with
  not identical date (e.g using diff()?
  so that I get eg:
  
id x
  1  a 3
  2  b 2
  3  c 4
  
  
  I am sure it must be possible
  
  thanks for any (even short) hint
  
  cheers
  Christoph
  
  
  
  --
  data - data.frame(c(rep(a, 4), rep(b, 4), rep(c, 4)),
 runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4))
  names(data) - c(id, meas, date)
  
  m - aggregate(data$meas, list(id = data$id), sum)
  names(m) - c(id, cum.meas)
  
 
 
 How about:
 
 m - aggregate(data[date], data[id],
 function(x) length(unique(x)))
 
 --sundar
 

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] aggregation question

2005-04-15 Thread Christoph Lehmann

great, Andy! Thanks a lot- I didn't know split. 
So 'split' can be used as alternative for 'aggregate', with the advantage 
that in the passed self-defined function one can consider more than one 
variable of the to-be-aggregated data.frame?

Christoph
 If I understood you correctly, here's one way:
 
  sumWO2 - sapply(split(dat, dat$id), function(d) sum(d$meas[d$date !=
 2]))
  sumWO2
 a b c 
 0.9439614 0.4481582 1.6967618 
 
 Andy
 
 
  From: Christoph Lehmann 
  
  Dear Sundar, dear Andy
  manyt thanks for the length(unique(x)) hint. It solves of course my 
  problem in a very elegant way. Just of curiosity (or for 
  potential future 
  problems): how could I solve it in a way, conceptually 
  different, namely, 
  that the computation on 'meas' being dependent on the 
  variable 'date'?, 
  means the computation on a variable x in the function passed 
  to aggregate 
  is conditional on the value of another variable y? I hope you 
  understand 
  what I mean, let's think of an example:
  
  E.g for the example data.frame below, the sum shall be taken over the 
  variable meas only for all entries with a corresponding 'data' != 2
  
  for this do I have to nest two aggregate statements, or is 
  there a way 
  using sapply or similar apply-based commands?
  
  thanks a lot for your kind help.
  
  Cheers!
  
  Christoph
  
  aggregate(data$meas, list(id = data$id), sum)
   
   
   Christoph Lehmann wrote on 4/15/2005 9:51 AM:
Hi I have a question concerning aggregation

(simple demo code S. below)

I have the data.frame

   idmeas date
1   a 0.6375137471
2   a 0.1877100632
3   a 0.2470984592
4   a 0.3064476903
5   b 0.4075735772
6   b 0.7832550852
7   b 0.3442650823
8   b 0.1038930683
9   c 0.7386495861
10  c 0.6141540372
11  c 0.9499243713
12  c 0.0081878584

When I want for each id the sum of its meas I do:

aggregate(data$meas, list(id = data$id), sum)

If I want to know the number of meas(ures) for each id I do, eg

aggregate(data$meas, list(id = data$id), length)

NOW: Is there a way to compute the number of meas(ures) 
  for each id 
  with
not identical date (e.g using diff()?
so that I get eg:

  id x
1  a 3
2  b 2
3  c 4


I am sure it must be possible

thanks for any (even short) hint

cheers
Christoph



--
data - data.frame(c(rep(a, 4), rep(b, 4), rep(c, 4)),
   runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 
  1, 2, 3, 4))
names(data) - c(id, meas, date)

m - aggregate(data$meas, list(id = data$id), sum)
names(m) - c(id, cum.meas)

   
   
   How about:
   
   m - aggregate(data[date], data[id],
   function(x) length(unique(x)))
   
   --sundar
   
  
  -- 
  +++ GMX - Die erste Adresse für Mail, Message, More +++
  

  
  
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 

-- 


GMX Garantie: Surfen ohne Tempo-Limit! http://www.gmx.net/de/go/dsl

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] aggregate slow with variables of type 'dates' - how to solve

2005-04-15 Thread Christoph Lehmann

Dear all
I use aggregate with variables of type numeric and dates. For type numeric  
functions, such as sum() are very fast, but similar simple functions, such 
as min() are much slower for the variables of type 'dates'. The difference 
gets bigger the larger the 'id' var is - but see this sample code:

dts - dates(c(02/27/92, 02/27/92, 01/14/92,
   02/28/92, 02/01/92))
ntimes - 70
dts - data.frame(rep(c(1:40), ntimes/8), 
  chron(rep(dts, ntimes), format = c(dates = m/d/y)),
  rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes))
names(dts) - c(id, date, tbs)


date()
dat.1st - aggregate(dts$date, list(id = dts$id), min)$x
dat.1st - chron(dat.1st, format = c(dates = m/d/y)) 
dat.1st
date() #82 seconds


date()
tbs.s - aggregate(as.numeric(dts$tbs),list(id = dts$id), sum)
tbs.s
date() #17 seconds

--- is it a problem of data-type 'dates' ? if yes, is there any solution 
to solve this, since for huge data-sets, this can be a problem...

as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the 
two times are roughly the same, but with the 40 different ids, we have 
this big difference

thanks a lot

Christoph

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] sweave bwplot error

2005-04-07 Thread Christoph Lehmann

Hi
I use sweave and have a problem with the following figure, but not with 
other figures:

tt - data.frame(c(a, b, c), c(1.2, 3, 4.5))
names(tt) - c(x1, x2)
bwplot(x2 ~x1, data = tt)
ok now in sweave:
\begin{figure}[H]
  \begin{center}
echo=FALSE, fig=TRUE, height=5, width=10=
lset(col.whitebg())
bwplot(x2 ~x1, data = tt)
@
\caption{xxx}
  \end{center}
\end{figure}
PROBLEM:
the pdf of the figure is not correctly created (neither the esp) and the 
error I get from sweave is:
pdf inclusion: required page does not exist 0

thanks for help
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] scan html: sep = td

2005-04-04 Thread Christoph Lehmann

Hi
I try to import html text and I need to split the fields at each td or 
/td entry

How can I succeed? sep = 'td' doens't yield the right result
thanks for hints
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] aov or t-test applied on all variables of a data.frame

2005-03-11 Thread Christoph Lehmann

Hi
I have a data.frame with say 10 continuous variables and one grouping 
factor (say 3 levels)

how can I easily (without loops) apply for each continous variable e.g. 
an aov, with the grouping factor as my factor (or if the grouping factor 
has 2 levels, eg. a t-test)

thanks for a hint
cheers
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] aov or t-test applied on all variables of a data.frame

2005-03-11 Thread Christoph Lehmann

many thanks for the sapply hint. How can I use sapply for a compact 
result of the aov computation, say I call

sapply(dd[-1], function(y, f) aov(y ~ f), f = dd$V1)
aov gives the result in another form than t.test
thanks a lot

Peter Dalgaard wrote:
Christoph Lehmann [EMAIL PROTECTED] writes:
Hi
I have a data.frame with say 10 continuous variables and one grouping
factor (say 3 levels)
how can I easily (without loops) apply for each continous variable
e.g. an aov, with the grouping factor as my factor (or if the grouping
factor has 2 levels, eg. a t-test)
thanks for a hint
Generally something with lapply or sapply, e.g.
lapply(dd[-1], function(y) t.test(y~dd$V1))
$V2
Welch Two Sample t-test
data:  y by dd$V1
t = 1.5465, df = 39.396, p-value = 0.13
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.02500802  0.18764439
sample estimates:
mean in group 1 mean in group 2
   1.0968181.015500
...etc, one for each of V2..V8
or, in a more compact form 

sapply(dd[-1], function(y) t.test(y~dd$V1))[1:3,]
  V2V3V4 V5 V6V7
statistic 1.546456  1.008719  0.08158578 -0.2456436 -0.872376 -1.405966
parameter 39.39554  36.30778  39.70288   36.99061   36.99944  35.97947
p.value   0.1299909 0.3197851 0.935386   0.807316   0.3886296 0.1683118
  V8
statistic -0.6724112
parameter 29.65156
p.value   0.5065284
or (this'll get the confidence intervals and estimates printed sensibly).
sapply(dd[-1], function(y)unlist(t.test(y~dd$V1)[1:5]))
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] plot question

2005-03-03 Thread Christoph Lehmann

I have the following simple situation:
tt - data.frame(c(0.5, 1, 0.5))
names(tt) - a
plot(tt$a, type = 'o')
gives the following plot ('I' and '.' represent the axis):
I
I
I X
I
I
I X   X
I...
  1   2   3
what do I have to change to get the following:
I
I
I  X
I
I
I  X   X
I.
   1   2   3
i.e. the plot-region should be widened at the left and right side
thanks for a hint
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] matlab norm(h) command in R: sqrt(sum(h^2)) - use in an expression

2005-02-15 Thread Christoph Lehmann

Hi
in matlab I defined a function (double gamma, parameters at the end of 
this mail) as
  h(i)=((t/d1)^a1)*exp(-(t-d1)/b1)-c*((t/d2)^a2)*exp(-(t-d2)/b2);
  h=h/norm(h);

I do know that norm() in matlab is equal to:
  sqrt(sum(x^2))
in R
so in R I do it like:
#function (double gamama)
h - expression((t/d1)^a1*exp(-(t-d1)/b1)-c*(t/d2)^a2*exp(-(t-d2)/b2))
# plot it
t - seq(0, 2, by = 100)
t - t/1000
plot(eval(h), type = 'l')
# however this yields an error
h - h/sqrt(sum(h^2))
Error in h^2 : non-numeric argument to binary operator
what shall I do to get the matlab: h = h/norm(h) implemented in R?
thanks for a hint
christoph

# parameters
peak1 - 5.4
fwhm1 - 5.2
peak2 - 10.8
fwhm2 - 7.35
dip - 0.35
b1 - 0.9 # dispersion
b2 - 0.9 #dispersion
a1 - peak1/b1
a2 - peak2/b2
d1 - a1*b1
d2 - a2*b2
c - dip
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] D(eval(g)) problem, since Function `eval' is not in the derivatives table

2005-02-15 Thread Christoph Lehmann

thanks Andy and Dimitris for your reply to my expression/eval - problem
starting with the resulting expression g I need g's derivative as 
expression, but I get: Function `eval' is not in the derivatives table:

#function (double gamama)
h - expression((t/d1)^a1*exp(-(t-d1)/b1)-c*(t/d2)^a2*exp(-(t-d2)/b2))
# plot it
t - seq(0, 2, by = 100)
t - t/1000
g - expression(eval(h)/sqrt(sum(eval(h)^2)))
plot(eval(g), type = 'l')
g.deriv - D(g, t)
 Error in D(g, t) : Function `eval' is not in the derivatives table
is there any way one can solve this problem?
thanks a lot
christoph

## --
## parameters
peak1 - 5.4
fwhm1 - 5.2
peak2 - 10.8
fwhm2 - 7.35
dip - 0.35
b1 - 0.9 # dispersion
b2 - 0.9 #dispersion
a1 - peak1/b1
a2 - peak2/b2
d1 - a1*b1
d2 - a2*b2
c - dip
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] R on solaris 10 on a 64-bit ultra60 sparc

2005-02-14 Thread Christoph Lehmann

Hi
since I have no experience with solaris and sparc-architecture. We 
installed the latest Solaris 10 on a 64-bit ultra60 sparc machine. Since 
the solaris 10 is said to run native linux-applications: can I just 
download any r-binaries for linux? if yes, for which distribution?

or are there any solaris 10 binaries? or do I need to compile R myself 
on the sparc?

sorry for this beginner-question. but I am grateful for any small hint
cheers
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] label outliers in boxplot and/or bwplot

2005-02-11 Thread Christoph Lehmann

Hi
Is there a way to lable (e.g. observation-number) the outliers in a boxplot?
and in a bwplot?
thanks a lot
Christoph
P.S. identify() is not available with bwplot, is it?
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Handling large data sets via scan()

2005-02-04 Thread Christoph Lehmann

does it solve to a part your problem, if you use read.table() instead of 
scan, since it imports data directly to a data.frame?

let me know, if it helps
Nawaaz Ahmed wrote:
I'm trying to read in datasets with roughly 150,000 rows and 600
features. I wrote a function using scan() to read it in (I have a 4GB
linux machine) and it works like a charm.  Unfortunately, converting the
scanned list into a datafame using as.data.frame() causes the memory
usage to explode (it can go from 300MB for the scanned list to 1.4GB for
a data.frame of 3 rows) and it fails claiming it cannot allocate
memory (though it is still not close to the 3GB limit per process on my
linux box - the message is unable to allocate vector of size 522K). 

So I have three questions --
1) Why is it failing even though there seems to be enough memory available?
2) Why is converting it into a data.frame causing the memory usage to
explode? Am I using as.data.frame() wrongly? Should I be using some
other command?
3) All the model fitting packages seem to want to use data.frames as
their input. If I cannot convert my list into a data.frame what can I
do? Is there any way of getting around this?
Much thanks!
Nawaaz
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[Fwd: Re: [R] vectorization of a data-aggregation loop]

2005-02-02 Thread Christoph Lehmann

great! many thanks, Phil
Cheers
christoph
Phil Spector wrote:
Christoph -
   I think reshape is the function you're looking for:
tt - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3),
+ c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c',
+ 'a', 'b', 'a', 'b', 'c')))
 reshape(aggregate(as.numeric(tt$iwv),list(id=tt$id,type=tt$type),sum),idvar=id,timevar=type,direction=wide) 

  id x.a x.b x.c
1  1   6  13   6
2  2  10  NA   7
3  3   9  14   7
   - Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 [EMAIL PROTECTED]
On Tue, 1 Feb 2005, Christoph Lehmann wrote:
Hi
I have a simple question:
the following data.frame
  id iwv type
1   1   1a
2   1   2b
3   1  11b
4   1   5a
5   1   6c
6   2   4c
7   2   3c
8   2  10a
9   3   6b
10  3   9a
11  3   8b
12  3   7c
shall be aggregated into the form:
 id t.a t.b t.c
1  1   6  13   6
6  2  10   0   7
9  3   9  14   7
means for each 'type' (a, b, c) a new column is introduced which
gets the sum of iwv for the respective observations 'id'
of course I can do this transformation/aggregation in a loop (see 
below), but is there a way to do this more efficiently, eg. in using 
tapply (or something similar)- since I have lot many rows?

thanks for a hint
christoph
#-- 

# the loop-way
t - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), 
c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 
'c', 'a', 'b', 'a', 'b', 'c')))
names(t) - c(id, iwv, type)
t$iwv - as.numeric(t$iwv)
t

# define the additional columns (type.a, type.b, type.c)
tt - rep(0, nrow(t) * length(levels(t$type)))
dim(tt) - c(nrow(t), length(levels(t$type)))
tt - data.frame(tt)
dimnames(tt)[[2]] - paste(t., levels(t$type), sep = )
t - cbind(t, tt)
t
obs - 0
obs.previous - 0
row.elim - rep(FALSE, nrow(t))
ta - which((names(t) == t.a)) #number of column which codes the 
first type
r.ctr - 0
for (i in 1:nrow(t)){
 obs - t[i,]$id
 if (obs == obs.previous) {
   row.elim[i] - TRUE
   r.ctr - r.ctr + 1 #increment
   type.col - as.numeric(t[i,]$type)
   t[i - r.ctr, ta - 1 + type.col] - t[i - r.ctr, ta - 1 +
 type.col] + t[i,]$iwv
 }
 else {
   r.ctr - 0 #record counter
   type.col - as.numeric(t[i,]$type)
   t[i, ta - 1 + type.col] - t[i,]$iwv
 }
 obs.previous - obs
}

t - t[!row.elim,]
t - subset(t, select = -c(iwv, type))
t
#-- 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] vectorization of a data-aggregation loop

2005-02-01 Thread Christoph Lehmann

Hi
I have a simple question:
the following data.frame
   id iwv type
1   1   1a
2   1   2b
3   1  11b
4   1   5a
5   1   6c
6   2   4c
7   2   3c
8   2  10a
9   3   6b
10  3   9a
11  3   8b
12  3   7c
shall be aggregated into the form:
  id t.a t.b t.c
1  1   6  13   6
6  2  10   0   7
9  3   9  14   7
means for each 'type' (a, b, c) a new column is introduced which
gets the sum of iwv for the respective observations 'id'
of course I can do this transformation/aggregation in a loop (see 
below), but is there a way to do this more efficiently, eg. in using 
tapply (or something similar)- since I have lot many rows?

thanks for a hint
christoph
#--
# the loop-way
t - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), 
c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 
'c', 'a', 'b', 'a', 'b', 'c')))
names(t) - c(id, iwv, type)
t$iwv - as.numeric(t$iwv)
t

# define the additional columns (type.a, type.b, type.c)
tt - rep(0, nrow(t) * length(levels(t$type)))
dim(tt) - c(nrow(t), length(levels(t$type)))
tt - data.frame(tt)
dimnames(tt)[[2]] - paste(t., levels(t$type), sep = )
t - cbind(t, tt)
t
obs - 0
obs.previous - 0
row.elim - rep(FALSE, nrow(t))
ta - which((names(t) == t.a)) #number of column which codes the first 
type
r.ctr - 0
for (i in 1:nrow(t)){
  obs - t[i,]$id
  if (obs == obs.previous) {
row.elim[i] - TRUE
r.ctr - r.ctr + 1 #increment
type.col - as.numeric(t[i,]$type)
t[i - r.ctr, ta - 1 + type.col] - t[i - r.ctr, ta - 1 +
  type.col] + t[i,]$iwv
  }
  else {
r.ctr - 0 #record counter
type.col - as.numeric(t[i,]$type)
t[i, ta - 1 + type.col] - t[i,]$iwv
  }
  obs.previous - obs
}

t - t[!row.elim,]
t - subset(t, select = -c(iwv, type))
t
#--
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] several boxplots or bwplots into one graphic

2005-01-25 Thread Christoph Lehmann

many thanks for your great tip. I didn't know reshape.
Unfortunately in my real data, the values of my variables are ont all 
within the same range. Therefore, what shall I change in the code to get 
for each plot a scale, which is adjusted to the range of my variable?

thanks a lot
christoph
Chuck Cleland wrote:
  Your variables (var.*) seem to be on the same scale.  How about 
reshaping the data into a univariate layout and then using bwplot as 
follows:

mydata - data.frame(ID = 1:20, A = runif(20), B = runif(20),
 C = runif(20), GROUP = rep(c(0,1), c(10,10)))
mydata.uni - reshape(mydata, varying = list(c(A, B, C)),
  v.names = Y, timevar = VAR, times = c(A,
  B, C), direction = long)
library(lattice)
bwplot(Y ~ as.factor(GROUP) | VAR, data = mydata.uni, layout=c(3,1,1),
   xlab=Group)
hope this helps,
Chuck Cleland
Christoph Lehmann wrote:
Hi
I have 10 variables and 2 groups. I know how to plot a bwplot for ONE 
var. e.g.

var.a var.b var.c .. GROUP
0.2   0.5   0.2   .. 0
0.3   0.2   0.2   .. 0
..
0.1   0.8   0.7   .. 1
0.5   0.5   0.1   .. 1
..
bwplot(var.a ~ GROUP, data = my.data)
How can I plot 10 bwplots (or boxplots) automatically into one 
graphic? is there any function from lattice which can do this?

thanks for a short hint
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] several boxplots or bwplots into one graphic

2005-01-24 Thread Christoph Lehmann

Hi
I have 10 variables and 2 groups. I know how to plot a bwplot for ONE 
var. e.g.

var.a var.b var.c .. GROUP
0.2   0.5   0.2   .. 0
0.3   0.2   0.2   .. 0
..
0.1   0.8   0.7   .. 1
0.5   0.5   0.1   .. 1
..
bwplot(var.a ~ GROUP, data = my.data)
How can I plot 10 bwplots (or boxplots) automatically into one graphic? 
is there any function from lattice which can do this?

thanks for a short hint
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] classification for huge datasets: SVM yields memory troubles

2004-12-13 Thread Christoph Lehmann

Hi
I have a matrix with 30 observations and roughly 3 variables, each 
obs belongs to one of two groups. With svm and slda I get into memory 
troubles ('cannot allocate vector of size' roughly 2G). PCA LDA runs 
fine. Are there any way to use the memory issue withe SVM's? Or can you 
recommend any other classification method for such huge datasets?

P.S. I run suse 9.1 on a 2G RAM PIV machine.
thanks for a hint
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] About ROC curves

2004-11-29 Thread Christoph Lehmann

package ROC from bioconductor, eg:
http://www.bioconductor.org/repository/release1.5/package/Win32/
Cheers!
Christoph
Xin Qi wrote:
Hi, Dear all R users:
Does someone know whether R can calculate the Receiver Operating 
Characteristic (ROC) Curves? I didn't find it from the packages.

Thanks a lot.
--- Xin
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] LDA with previous PCA for dimensionality reduction

2004-11-24 Thread Christoph Lehmann

Dear all, not really a R question but:
If I want to check for the classification accuracy of a LDA with 
previous PCA for dimensionality reduction by means of the LOOCV method:

Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA 
with the CV option set to TRUE (runs LOOCV)

-- OR--
do I need
- to compute for each 'test-bag' (the n-1 observations) a PCA 
(my.princomp.1),
- then run the LDA on the test-bag scores (- my.lda.1)
- then compute the scores of the left-out-observation using 
my.princomp.1 (- my.scores.2)
- and only then use predict.lda(my.lda.1, my.scores.2) on the scores of 
the left-out-observation

?
I read some articles, where they choose procedure 1, but I am not sure, 
if this is really correct?

many thanks for a hint
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] LDA with previous PCA for dimensionality reduction

2004-11-24 Thread Christoph Lehmann

Thank you, Torsten; that's what I thought, as long as one does not use 
the 'class label' as a constraint in the dimension reduction, the 
procedure is ok. Of course it is computationally more demanding, since 
for each new (unknown in respect of the class label) observation one has 
to compute a new PCA as well.

Cheers
Christoph
Torsten Hothorn wrote:
On Wed, 24 Nov 2004, Ramon Diaz-Uriarte wrote:

Dear Cristoph,
I guess you want to assess the error rate of a LDA that has been fitted to a
set of currently existing training data, and that in the future you will get
some new observation(s) for which you want to make a prediction.
Then, I'd say that you want to use the second approach. You might find that
the first step turns out to be crucial and, after all, your whole subsequent
LDA is contingent on the PC scores you obtain on the previous step.

Ramon,
as long as one does not use the information in the response (the class
variable, in this case) I don't think that one ends up with an
optimistically biased estimate of the error (although leave-one-out is
a suboptimal choice). Of course, when one starts to tune the method
used for dimension reduction, a selection of the procedure with minimal
error will produce a bias. Or am I missing something important?
Btw, `ipred::slda' implements something not completely unlike the
procedure Christoph is interested in.
Best,
Torsten

Somewhat
similar issues have been discussed in the microarray literature. Two
references are:
@ARTICLE{ambroise-02,
 author = {Ambroise, C. and McLachlan, G. J.},
 title = {Selection bias in gene extraction on the basis of microarray
gene-expression data},
 journal = {Proc Natl Acad Sci USA},
 year = {2002},
 volume = {99},
 pages = {6562--6566},
 number = {10},
}
@ARTICLE{simon-03,
 author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.},
 title = {Pitfalls in the use of DNA microarray data for diagnostic and
prognostic classification},
 journal = {Journal of the National Cancer Institute},
 year = {2003},
 volume = {95},
 pages = {14--18},
 number = {1},
}
I am not sure, though, why you use PCA followed by LDA. But that's another
story.
Best,
R.
On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote:
Dear all, not really a R question but:
If I want to check for the classification accuracy of a LDA with
previous PCA for dimensionality reduction by means of the LOOCV method:
Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA
with the CV option set to TRUE (runs LOOCV)
-- OR--
do I need
- to compute for each 'test-bag' (the n-1 observations) a PCA
(my.princomp.1),
- then run the LDA on the test-bag scores (- my.lda.1)
- then compute the scores of the left-out-observation using
my.princomp.1 (- my.scores.2)
- and only then use predict.lda(my.lda.1, my.scores.2) on the scores of
the left-out-observation
?
I read some articles, where they choose procedure 1, but I am not sure,
if this is really correct?
many thanks for a hint
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
--
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] biplot.princomp with loadings only

2004-09-30 Thread Christoph Lehmann

Hi
is there a way to plot only the loadings in a biplot (with the nice 
arrows), and to skip the scores?

thanks
christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] glm.fit and predict.glm: error ' no terms component'

2004-09-29 Thread Christoph Lehmann

Hi
when I fit a glm by
glm.fit(x,y,family = binomial())

and then try to use the object for prediction of newdata by:
predict.glm(object, newdata)
I get the error:
Error in terms.default(object) : no terms component
I know I can use glm() and a formula, but for my case I prefer 
glm.fit(x,y)...

thanks for a hint
christoph
$platform
[1] i686-pc-linux-gnu
$arch
[1] i686
$os
[1] linux-gnu
$system
[1] i686, linux-gnu
$status
[1] 
$major
[1] 1
$minor
[1] 9.1
$year
[1] 2004
$month
[1] 06
$day
[1] 21
$language
[1] R
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] glm.fit and predict.glm: error ' no terms component'

2004-09-29 Thread Christoph Lehmann

many thanks I did it the following way, based on Thomas' suggestion
predict.glm.fit-function(glmfit, newmatrix){
   newmatrix-cbind(1,newmatrix)
   coef - rbind(1, as.matrix(glmfit$coef))
   eta - as.matrix(newmatrix) %*% as.matrix(coef)
   exp(eta)/(1 + exp(eta))
}
cheers
christoph



Thomas Lumley wrote:
On Wed, 29 Sep 2004, Christoph Lehmann wrote:
Hi
when I fit a glm by
glm.fit(x,y,family = binomial())
and then try to use the object for prediction of newdata by:
predict.glm(object, newdata)
I get the error:
Error in terms.default(object) : no terms component
I know I can use glm() and a formula, but for my case I prefer 
glm.fit(x,y)...

Well, you can't use predict.glm that way.  As the function name 
suggests, it is a predict method for objects of class glm, which in 
your case you do not have.

There are two reasons why it won't work.  For type=terms the formula 
is needed to identify terms, and for any type of prediction the formula 
is needed to convert the data frame newdata into a model matrix.

You would need to write a function where the new data was a model 
matrix. If you only need point predictions then

predict_glm_fit-function(glmfit, newmatrix, addintercept=TRUE){
   if (addintercept)
newmatrix-cbind(1,newmatrix)
   eta-glmfit$coef %*% newmatrix
   family$linkinv(eta)
}
would work.
-thomas
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] nnet with weights parameter: odd error

2004-09-23 Thread Christoph Lehmann

Dear R-users
I use nnet for a classification (2 classes) problem. I use the code 
CVnn1, CVnn2 as described in  VR.

The thing I changed to the code is: I define the (class) weight for each 
observation in each cv 'bag' and give the vector of weights as parameter 
of nnet(..weights = weight.vector...)

Unfortunately I get an error during some (but not all!) inner-fold cv runs:
Error in model.frame(formula, rownames, variables, varnames,
extras, extranames,  :
variable lengths differ
If you just remove the weights parameter in nnet() it runs fine!!
I debugged the code but could not resolve the problem- it is really very 
strange and I need your help! I tried it very simple in defining the 
weights as = 1 for each obs (as it is by default)!:

CVnn2 - function(formula, data,
  size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 
0.01),2)),
  nreps = 1, nifold = 5, verbose = 99, ...)
{
resmatrix - function(predict.matrix, learn, data, ri, i)
{
   rae.matrix -   predict.matrix
   rae.matrix[,] - 0
   rae.vector - as.numeric(as.factor((predict(learn, data[ri == i,],
   type = class
   for (k in 1:dim(rae.matrix)[1]) {
 if (rae.vector[k] == 1)
 rae.matrix[k,1] - rae.matrix[k,1] + 1
 else
 rae.matrix[k,2] - rae.matrix[k,2] + 1
   }
   rae.matrix
}

CVnn1 - function(formula, data, nreps=1, ri, verbose,  ...)
{
totalerror - 0
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
if(verbose  20) cat(  inner fold)
for (i in sort(unique(ri))) {
if(verbose  20) cat( , i,  sep=)
data.training - data[ri != i,]$GROUP
weight.vector - rep(1, dim(data[ri !=i,])[1])
for(rep in 1:nreps) {
learn - nnet(formula, data[ri !=i,],
  weights = weight.vector,
  trace = F, ...)
#res[ri == i,] - res[ri == i,] + predict(learn, 
data[ri == i,])
res[ri == i,] - res[ri == i,] + resmatrix(res[ri == i,],
   learn, data, 
ri, i)
}
}
if(verbose  20) cat(\n)
sum(as.numeric(truth) != max.col(res/nreps))
}
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
choice - numeric(length(lambda))
for (i in sort(unique(rand))) {
if(verbose  0) cat(fold , i,\n, sep=)
set.seed(i*i)
ri - sample(nifold, sum(rand!=i), replace=T)
for(j in seq(along=lambda)) {
if(verbose  10)
cat(  size =, size[j], decay =, lambda[j], \n)
choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps,
   ri=ri, size=size[j], decay=lambda[j],
   verbose=verbose, ...)
}
decay - lambda[which.is.max(-choice)]
csize - size[which.is.max(-choice)]
if(verbose  5) cat(  #errors:, choice,   ) #
if(verbose  1) cat(chosen size = , csize,
 decay = , decay, \n, sep=)
for(rep in 1:nreps) {
data.training - data[rand != i,]$GROUP
weight.vector - rep(1, dim(data[rand !=i,])[1])
learn - nnet(formula, data[rand != i,],
  weights = weight.vector,
  trace=F,
  size=csize, decay=decay, ...)
#res[rand == i,] - res[rand == i,] + predict(learn, 
data[rand == i,])
res[rand == i,] - res[rand == i,] + resmatrix(res[rand == 
i,],learn,data, rand, i)
}
}
factor(levels(truth)[max.col(res/nreps)], levels = levels(truth))
}


res.nn2 - CVnn2(GROUP ~ ., rae.data.subsetted1, skip = T, maxit = 500,
 nreps = cv.repeat)
con(true = rae.data.subsetted$GROUP, predicted = res.nn2)

###
Coordinates:
platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status
major1
minor9.1
year 2004
month06
day  21
language R

Thanks a lot
Best regards
Christoph
--
Christoph LehmannPhone:  ++41 31 930 93 83
Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00
University Hospital of Clinical Psychiatry   Fax:++41 31 930 99 61
Waldau[EMAIL PROTECTED]
CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] nnet and weights: error analysis using VR example

2004-09-23 Thread Christoph Lehmann

Dear R-users, dear Prof. Ripley as package maintainer
I tried to investigate the odd error, when I call nnet together with a 
'weights' parameter, using the 'fgl' example in VR p 348

The error I get is:
Error in eval(expr, envir, enclos) : Object w not found
I think it is a kind of scoping problem, but I really cannot see, what 
the problem exactly is.

and here is my code: the only thing which changed is the definition of a 
weight-parameter ('w') which is given to the nnet-call. Of course the 
weight vector with '1's makes no sense here, but it will be defined 
according to the class sizes later.

###
library(MASS)
data(flg)
con - function(...)
{
print(tab - table(...))
diag(tab) - 0
cat(error rate = ,
round(100*sum(tab)/length(list(...)[[1]]), 2), %\n)
invisible()
}
set.seed(123)
rand - sample(10, dim(fgl)[1], replace = T)
fgl1 - fgl
fgl1[1:9] - lapply(fgl[, 1:9], function(x) {r - range(x); (x - 
r[1])/diff(r)})

CVnn2 - function(formula, data,
  size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 
0.01),2)),
  nreps = 1, nifold = 5, verbose = 99, ...)
{

CVnn1 - function(formula, data, nreps=1, ri, verbose,  ...)
{
totalerror - 0
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
if(verbose  20) cat(  inner fold)
for (i in sort(unique(ri))) {
if(verbose  20) cat( , i,  sep=)
data.training - data[ri != i,]$GROUP
w - rep(1, dim(data[ri !=i,])[1])
for(rep in 1:nreps) {
learn - nnet(formula, data[ri !=i,],
  weights = w,
  trace = F, ...)
res[ri == i,] - res[ri == i,] + predict(learn, data[ri 
== i,])

}
}
if(verbose  20) cat(\n)
sum(as.numeric(truth) != max.col(res/nreps))
}
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
choice - numeric(length(lambda))
for (i in sort(unique(rand))) {
if(verbose  0) cat(fold , i,\n, sep=)
set.seed(i*i)
ri - sample(nifold, sum(rand!=i), replace=T)
for(j in seq(along=lambda)) {
if(verbose  10)
cat(  size =, size[j], decay =, lambda[j], \n)
choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps,
   ri=ri, size=size[j], decay=lambda[j],
   verbose=verbose, ...)
}
decay - lambda[which.is.max(-choice)]
csize - size[which.is.max(-choice)]
if(verbose  5) cat(  #errors:, choice,   ) #
if(verbose  1) cat(chosen size = , csize,
 decay = , decay, \n, sep=)
for(rep in 1:nreps) {
data.training - data[rand != i,]$GROUP
w - rep(1, dim(data[rand !=i,])[1])
learn - nnet(formula, data[rand != i,],
  weights = w,
  trace=F,
  size=csize, decay=decay, ...)
res[rand == i,] - res[rand == i,] + predict(learn, 
data[rand == i,])
}
}
factor(levels(truth)[max.col(res/nreps)], levels = levels(truth))
}

res.nn2 - CVnn2(type ~ ., fgl1, skip = T, maxit = 500, nreps = 10)
con(true = fgl$type, predicted = res.nn2)
##
many thanks for your help
Christoph
###
Coordinates:
platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status
major1
minor9.1
year 2004
month06
day  21
language R
--
Christoph LehmannPhone:  ++41 31 930 93 83
Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00
University Hospital of Clinical Psychiatry   Fax:++41 31 930 99 61
Waldau[EMAIL PROTECTED]
CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] memory problem under windows

2004-09-14 Thread Christoph Lehmann

I have (still) some memory problems, when trying to allocate a huge array:
WinXP pro, with 2G RAM
I start R by calling:   
Rgui.exe --max-mem-size=2Gb (as pointed out in R for windows FAQ)
R.Version(): i386-pc-mingw32, 9.1, 21.6.2004
## and here the problem
x.dim - 46
y.dim - 58
slices - 40
volumes - 1040
a - rep(0, x.dim * y.dim * slices * volumes)
dim(a) - c(x.dim, y.dim, slices, volumes)
gives me: Error: cannot allocate vector of size 850425 Kb
even though
memory.limit(size = NA)
yields  2147483648
and
memory.size()
gives 905838768
so why is that and what can I do against it?
Many thanks for your kind help
Cheers
Christoph
--
Christoph LehmannPhone:  ++41 31 930 93 83
Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00
University Hospital of Clinical Psychiatry   Fax:++41 31 930 99 61
Waldau[EMAIL PROTECTED]
CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Variable Importance in pls: R or B? (and in glpls?)

2004-09-12 Thread Christoph Lehmann

Dear R-users, dear Ron
I use pls from the pls.pcr package for classification. Since I need to 
know which variables are most influential onto the classification 
performance, what criteria shall I look at:

a) B, the array of regression coefficients for a certain model (means a 
certain number of latent variables) (and: squared or absolute values?)

OR
b) the weight matrix RR (or R in the De Jong publication; in Ding  
Gentleman this is the P Matrix and called 'loadings')? (and again: 
squared or absolute values?)


and what about glpls (glpls1a) ?
shall I look at the 'coefficients' (regression coefficients)?
Thanks for clarification
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] locator() in a multi-figure setting using mfrow(): SOLVED

2004-09-09 Thread Christoph Lehmann

thanks to some great hints by Paul Murrel I could solve it: here we are 
with one solution (code needs to be cleaned and simplified, but maybe 
one can understand it)

###
## create a multifigure setting
nr - 4
nc - 2
opar - par(mfrow = c(nr, nc))
slices - 8
m - matrix(runif(100),10,10)
my.list - list()
for (slice in 1:slices) {
my.list[[slice]] - m
}
for  (slice in 1:slices) {
x - 1*(1:25)
y - 1*(1:25)
z - my.list[[slice]]
image(list(x = 0:9, y = 0:9, z = z))
}

my.get.coord - function() {
par(mfg = c(1,1)) #locator() shall be relative to the first plot out
# of the eight plots totally
my.loc -locator(1) #location, not in inches
my.plot.region - par(usr) #extremes of plotting region
#(in plot units, not inches)
my.plot.region.x - my.plot.region[2] - my.plot.region[1]
my.plot.region.y - my.plot.region[4] - my.plot.region[3]
my.loc.inch.x - (my.loc$x + 0.5)/my.plot.region.x * (par(pin)[1]) 
#par(pin) #current plot dimension in inches
#relative to the plotting-region bottom left corner, not the axis c(0,0) 
point
my.loc.inch.y - (my.loc$y + 0.5)/my.plot.region.y * (par(pin)[2])

## search the plot we are in with locator(1)
my.plot.inch.x - par(pin)[1] + par(mai)[2] + par(mai)[4] #plot.x 
+ left  right margin
par(fin)[1]
my.plot.inch.y - par(pin)[2] + par(mai)[1] + par(mai)[3] #plot.y 
+ bottom  top margin
par(fin)[2]

pos.rel.x - (my.loc.inch.x / par(fin)[1] - floor(my.loc.inch.x / 
par(fin)[1])) *
 par(fin)[1] / par(pin)[1] * (par(usr)[2] - par(usr)[1]) - 0.5
 #inches from left bottom corner in target plot region (c(0,0)
 # is plot-region bottom-left corner, not the axis c(0,0) point
pos.rel.y - (my.loc.inch.y / par(fin)[2] - floor(my.loc.inch.y / 
par(fin)[2])) *
 par(fin)[2] / par(pin)[2] * (par(usr)[4] - par(usr)[3]) - 0.5
 #inches from left bottom corner in target plot

fig.coord.x - ceiling(my.loc.inch.x / par(fin)[1])
fig.coord.y - 1 +(-1) *ceiling(my.loc.inch.y / par(fin)[2])
# cat(figure-coord x: , fig.coord.x,\n)
# cat(figure-coord y: , fig.coord.y,\n)
cat(we are in figure: , fig.coord.y * nc + fig.coord.x, \n)
cat(coordinates of the identified point x: , round(pos.rel.x),\n)
cat(coordinates of the identified point y: , round(pos.rel.y),\n)
}

my.get.coord()
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] locator() in a multi-figure setting using mfrow()

2004-09-06 Thread Christoph Lehmann

Hi
based on some code from Thomas Petzoldt (see below), I have a question, 
about how to use locator() in a mfrow() multi-figure setting. I am sure 
this should be a kind of problem, which other people face too?

we have 8 matrices each 10x10 fields, plotted as mfrow = c(2,4).
how can I get, using locator(), the correct index of a field of one of 
the 8 matrices? means, I should get 3 values: the matrix I am in (a 
value between 1..8), and the corresponding x and y coordinates (each in 
1..10)

many, thanks for your kind help.
---
opar - par(mfrow = c(2,4))
slices - 8
m - matrix(runif(100),10,10)
my.list - list()
for (slice in 1:slices) {
my.list[[slice]] - m
}
for  (slice in 1:slices) {
x - 1*(1:25)
y - 1*(1:25)
z - my.list[[slice]]
image(list(x = 0:9, y = 0:9, z = z))
}
par(opar) #restore device parameters
p - locator(1)
c(round(p$x), round(p$y))
---
how can I get the correct location in the sense of a
3d info: (a) which slice (p$slice) (b) p$x (c) p$y
so that it could be used in the sense of:
my.list[[p$slice]][round(p$x), round(p$y)]
christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] locator() in a multi-figure setting using mfrow()

2004-09-06 Thread Christoph Lehmann

I know, that I can use par(mfg = c(i,u)) to get the correct x,y 
coordinates of one of the 8 matrices/subimages, but how can I get the i 
and the j, means how can I know in which of the 8 images I am clicking in?

thanks
Christoph
Christoph Lehmann wrote:
Hi
based on some code from Thomas Petzoldt (see below), I have a question, 
about how to use locator() in a mfrow() multi-figure setting. I am sure 
this should be a kind of problem, which other people face too?

we have 8 matrices each 10x10 fields, plotted as mfrow = c(2,4).
how can I get, using locator(), the correct index of a field of one of 
the 8 matrices? means, I should get 3 values: the matrix I am in (a 
value between 1..8), and the corresponding x and y coordinates (each in 
1..10)

many, thanks for your kind help.
---
opar - par(mfrow = c(2,4))
slices - 8
m - matrix(runif(100),10,10)
my.list - list()
for (slice in 1:slices) {
my.list[[slice]] - m
}
for  (slice in 1:slices) {
x - 1*(1:25)
y - 1*(1:25)
z - my.list[[slice]]
image(list(x = 0:9, y = 0:9, z = z))
}
par(opar) #restore device parameters
p - locator(1)
c(round(p$x), round(p$y))
---
how can I get the correct location in the sense of a
3d info: (a) which slice (p$slice) (b) p$x (c) p$y
so that it could be used in the sense of:
my.list[[p$slice]][round(p$x), round(p$y)]
christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] huge array with floats: allocation error

2004-08-16 Thread Christoph Lehmann

Hi
After searching through a couples of documents and the mailing list I 
dare to ask it here

I need to define an array with the size 64 x 64 x 16 x 1000 for 
single-precision floating-point numbers. With 1G RAM I get always the 
error:

cannot allocate vector of size 458752 Kb
reached total allocation of 1022MB: see help(memory.size)
I consulted memory.size() but it didn't help me.
so my question: I know that there is NO float type in R. Is there any 
way to solve my problem, without increasing the RAM?

many thanks
Cheers!
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] huge array with floats: allocation error

2004-08-16 Thread Christoph Lehmann

Thank you, Prof. Ripley

I don't believe you read the rw-FAQ as the posting guide asks, though.
You seem to be working under Windows, without saying so (and the posting 
guide does ask you to).  So that's `a couples of documents' worth
`searching through'.
I apologize for not being more precise. Fact is:
x - rep(0, 64 * 64 * 16 * 1000)
dim(x) - c(64,64,16,1000)
indeed DOES work on (I tried it on several machines):
(i) Linux Suse9.1 box with 1G RAM
(ii) Windows WinXp with 1G RAM
(iii) Windows WinXp with 2G RAM
all with R Version 1.9.1
but:
tst.array - array(0, c(64, 64, 16, 1000))
Error: cannot allocate vector of size 512000 Kb
does not work on none of these machines/RAM combinations
why is this?
many thanks for a further hint. I am sure I overlooked something very 
basic- forgive me.

Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] association rules in R

2004-08-14 Thread Christoph Lehmann

Hi
I am interested in data mining problems. Has anybody ever programmed and 
worked with association rules in R?

I am very grateful for any hint.
Best regards
Christoph
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] vectorizing a matrix computation

2004-07-20 Thread Christoph Lehmann

Dear R users
I have a 4-dimensional matrix (actually several 3d (x,y, slices) 
matrices appended over time (volumes))

say, e.g. I want to z-transform the data (subtract the mean and divide 
by the std-deviation)

for (slice in 1:slices) {
for (x in 1:x.dim) {
for (y in 1:y.dim) {
t - as.matrix(my.matrix[x,y,slice,1:volumes])
for (vol in 1:volumes) {
my.matrix.transformed[x,y,slice,vol] - 
(my.matrix[x,y,slice,vol] - mean(t))/sqrt(var(t))
}
}
}
}

how can I vectorize such a function using, one of the *apply functions?
many thanks
Cheers
Christoph
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] multiple hierarchical anova models

2004-07-13 Thread Christoph Lehmann

Hi
I can recommend you two files
a) http://www.psych.upenn.edu/~baron/rpsych/rpsych.html
b) http://www.pallier.org/ressources/stats_with_R/stats_with_R.pdf (in 
french)

cheers
let me know whether this helped you
cheers
christoph
Matthias Unterhuber wrote:
Hello,
My name is Matthias and I do look for syntax regarding hierarchal anova
models in R. How can I express that a factor is nested within the
combination of two other factors A(B,C), e.g. for aov(...)? I did not find
the corresponding expression. Furthermore, I wanted to ask whether block
factors have to be specified in a specific way or are they just treated as
other factors (with no interactions).
Furthermore, in general an overview might be useful for beginners that
describes the structural equations of more complicated anova-designs
(hierarchical and block factor designs...) in the syntax of R.
Best wishes and thanks,
Matthias
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] locator() in a multiple plot setting

2004-07-13 Thread Christoph Lehmann

Hi
based on some code from Thomas Petzoldt, I have a question:
---
opar - par(mfrow = c(2,4))
slices - 8
m - matrix(runif(100),10,10)
my.list - list()
for (slice in 1:slices) {
my.list[[slice]] - m
}
for  (slice in 1:slices) {
x - 1*(1:25)
y - 1*(1:25)
z - my.list[[slice]]
image(list(x = 0:9, y = 0:9, z = z))
}
par(opar) #restore device parameters
p - locator(1)
c(round(p$x), round(p$y))
---
how can I get the correct location in the sense of a
3d info: (a) which slice (p$slice) (b) p$x (c) p$y
so that it could be used in the sense of:
my.list[[p$slice]][round(p$x), round(p$y)]
many thanks
Cheers
christoph
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] paired t-test with bootstrap

2004-07-13 Thread Christoph Lehmann

just a hint for further bootstrapping examples (worked out with R):
Bootstrap Methods and Their Applications by A.C. Davison and D.V. Hinkley
cheers
christoph
luciana wrote:
Dear Sirs,
I am a R beginning user: by mean of R I would like to apply the bootstrap to my data 
in order to test cost differences between independent or paired samples of people 
affected by a certain disease.
My problem is that even if I am reading the book by Efron (introduction to the 
bootstrap), looking at the examples in internet and available in R, learning a lot of 
theoretical things on bootstrap, I can't apply bootstrap with R to my data because of 
many doubts and difficulties. This is the reason why I have decided to ask the expert 
for help.
 

I have a sample of diabetic people, matched (by age and sex) with a control sample. 
The variable I would like to compare is their drug and hospital monthly cost. The 
variable cost has a very far from gaussian distribution, but I need any way to compare 
the mean between the two group. So, in the specific case of a paired sample t-test, I 
aim at testing if the difference of cost is close to 0. What is the better way to 
follow for that?
 

Another question is that sometimes I have missing data in my dataset (for example I 
have the cost for a patients but not for a control). If I introduce NA or a dot, R 
doesn't estimate the statistic I need (for instance the mean). To overcome this 
problem I have replaced the missing data with the mean computed with the remaining 
part of data. Anyway, I think R can actually compute the mean even with the presence 
of missing data. Is it right? What can I do?
 

Thank you very much for your attention and, I hope, your help.
 

Best wishes 

 

Luciana Scalone
Center of Pharmacoeconomics
University of Milan
[[alternative HTML version deleted]]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] pixmapIndexed color question

2004-07-12 Thread Christoph Lehmann

Hi
I use pixmapIndexed
tmp.vimp - array(0,c(x.dim,y.dim))
tmp.vimp - pixmapIndexed(tmp.vimp, col=rainbow)
to plot values of a 2D matrix. I 'fill' the pixmapIndexed like:
for (x in 1:x.dim) {
for (y in 1:y.dim) {
[EMAIL PROTECTED],y] - my.matrix[x,y]
}}
how can I define, that the colors are painted e.g. according the rainbow 
palette?

plot(tmp.vimp) paints all 'pixels' in red even though I specified it 
with col=rainbow (see above)

many thanks
cheers
christoph
p.s. is there an easier method for 'painting' the values of a 2d matrix?
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] image NAs error

2004-07-12 Thread Christoph Lehmann

tmp.vimp - matrix(NA, nrow = x.dim, ncol = y.dim)
tmp.vimp - image(tmp.vimp, col=rainbow)
gives:
Error in image.default(tmp.vimp, col = rainbow) :
invalid z limits
In addition: Warning messages:
1: no finite arguments to min; returning Inf
2: no finite arguments to max; returning -Inf
even though NAs are allowed in image
what went wrong here?
thank you
Christoph
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] sweave: graphics not at the expected location in the pdf

2004-06-25 Thread Christoph Lehmann

Hi 

I use sweave for excellent pdf output (thank you- Friedrich Leisch). I
have just one problem. Quite often it happens, that the graphics are not
at the place where I expect them, but (often on a separate page) later
on in the pdf. How can I fix this, means how can I define, that I want a
graphic exactly here and now in the document?

Many thanks and best regards

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Visual stimulus presentation using R?

2004-06-21 Thread Christoph Lehmann

Hi Christoph

I have never done such stuff with R (and I don't know, e.g. how good the
timing would be, in case you are interested in reaction times and so on)

but have a look at 

www.visionegg.org

it's a stimulus-presentation framework, written by Andrew Straw entirely
in python and has lots of great features.

cheers

Christoph

P.S please let me know if you succeed with a solution in R!
On Mon, 2004-06-21 at 15:27, Christoph Lange wrote:
 Dear all!
 
 Although the Psycho-Toolbox for Matlab is free software, Matlab isn't.
 I'm planning to do an experiment where it's essentail to travel to the
 subjects, not let the subjects come to where the Matlab licences are
 :-(
 
 So I need to use a free software for my experiment if I don't want to
 by an extra Matlab licence (which I don't want to).
 
 Did anyone ever try to do presentation of visual stimuli (images,
 practically, with a little bit of text in my case) with R? I looked
 into the documentation of rgl, but what's lacking there is (as far as
 I saw) the possibility to also read (unbuffered) keyboard input.
 
 So what I need is:
 
   1. put images onto the (full!)screen (qick)
   2. read keyboard input
   3. write results (to an R structure, presumably)
 
 Any idea, suggestion?
 
 
 Cheers,
   Christoph.
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lme newbie question

2004-06-11 Thread Christoph Lehmann

Hi
I try to implement a simple 2-factorial repeated-measure anova in the
lme framework and would be grateful for a short feedback

-my dependent var is a reaction-time (rt), 
-as dependent var I have 
   -the age-group (0/1) the subject belongs to (so this is a
between-subject factor), and 
   -two WITHIN experimental conditions, one (angle) having 5, the other
3 (hands) factor-levels; means each subjects performs on 3 * 5 = 15
different task diffiulties

Am I right in this lme implementation, when I want to investigate the
influence of the age.group, and the two conditions on the rt:

my.lme - lme(rt ~ age.group + angles * hands, data = my.data, random =
~ 1 |subject)

then I think I would have to compare the model above with a more
elaborated one, including more interactions:

my.lme2 - lme(rt ~ age.group * angles * hands, data = my.data, random
= ~ 1 |subject)

and comparing them by performing a likelhood-ratio test, yes?

I think, if I would like to generalize the influence of the experimental
conditions on the rt I should define angles and hands as a random
effect, yes? 

?

thanks for a short feedback. It seems, repeated-measures anova's aren't
a trivial topic in R :)

Cheers!

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] classification with nnet: handling unequal class sizes

2004-03-31 Thread Christoph Lehmann

Dear Prof. Ripley

Since you are the creator of the MASS library I dare to ask you a short
question, for which I didn't get an answer from the R mailing-list. If
you feel disturbed by my question, please forgive me and just ignore my
mail. 

I use the nnet code from your book, VR p. 348: The very nice and general function
CVnn2() to choose the number of hidden units and the amount of weight
decay by an inner cross-validation- with a slight modification to use it
for classification (see below).

My data has 2 classes with unequal size: 45 observations for classI and
116 obs. for classII (number of variables: 39)

With CVnn2 I get the following confusion matrix (%) (average of 10
runs):

predicted
true53 47
16 84

I had a similar biased confusion matrix with randomForest until I used
the sampsize argument (the same holds for svm until I used the
class.weights argument).

How can I handle this problem of unequal class sizes with nnet, in order
to get a less biased confusion matrix?

(with randomForest I finally got 

78 22
16 84
)


many thanks for a hint. By the way, I just want to say 'thank you' for
your great MASS book. Since your first recommendation, I consult it
quite frequently.

Christoph



#--- neural networks

#classification network is constructed; this has one output and entropy
fit if the number of levels is two, and a number of outputs equal to the
number of classes and a softmax output stage for more levels. -
therefore two lines of Prof. Ripley's wrapper function are changed below
(original commented out) and an additional function has been introduced
(resmatrix)

con - function(...)
{
print(tab - table(...))
diag(tab) - 0
cat(error rate = ,
round(100*sum(tab)/length(list(...)[[1]]), 2), %\n)
invisible()
}


CVnn2 - function(formula, data,
  size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001,
0.01),2)),
  nreps = 1, nifold = 5, verbose = 99, ...)
{
resmatrix - function(predict.matrix,learn, data, ri, i)
{
   rae.matrix -   predict.matrix
   rae.matrix[,] - 0
   rae.vector - as.numeric(as.factor((predict(learn, data[ri ==
i,], type = class
   for (k in 1:dim(rae.matrix)[1]) {
 if (rae.vector[k] == 1) rae.matrix[k,1] - rae.matrix[k,1] + 1
else
 rae.matrix[k,2] - rae.matrix[k,2] + 1
   }
   rae.matrix
}


CVnn1 - function(formula, data, nreps=1, ri, verbose,  ...)
{
totalerror - 0
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
if(verbose  20) cat(  inner fold)
for (i in sort(unique(ri))) {
if(verbose  20) cat( , i,  sep=)
for(rep in 1:nreps) {
learn - nnet(formula, data[ri !=i,], trace = F, ...)
#res[ri == i,] - res[ri == i,] + predict(learn, data[ri
== i,])
res[ri == i,] - res[ri == i,] + resmatrix(res[ri ==
i,],learn,data, ri, i)
}
}
if(verbose  20) cat(\n)
sum(as.numeric(truth) != max.col(res/nreps))
}
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
choice - numeric(length(lambda))
for (i in sort(unique(rand))) {
if(verbose  0) cat(fold , i,\n, sep=)
ri - sample(nifold, sum(rand!=i), replace=T)
for(j in seq(along=lambda)) {
if(verbose  10)
cat(  size =, size[j], decay =, lambda[j], \n)
choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps,
   ri=ri, size=size[j], decay=lambda[j],
   verbose=verbose, ...)
}
decay - lambda[which.is.max(-choice)]
csize - size[which.is.max(-choice)]
if(verbose  5) cat(  #errors:, choice,   ) #
if(verbose  1) cat(chosen size = , csize,
 decay = , decay, \n, sep=)
for(rep in 1:nreps) {
learn - nnet(formula, data[rand != i,], trace=F,
  size=csize, decay=decay, ...)
#res[rand == i,] - res[rand == i,] + predict(learn,
data[rand == i,])
res[rand == i,] - res[rand == i,] + resmatrix(res[rand ==
i,],learn,data, rand, i)
}
}
factor(levels(truth)[max.col(res/nreps)], levels = levels(truth))
}



-- 
Christoph Lehmann [EMAIL PROTECTED]
-- 
Christoph LehmannPhone:  ++41 31 930 93 83 
Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00
University Hospital of Clinical Psychiatry   Fax:++41 31 930 99 61 
Waldau[EMAIL PROTECTED] 
CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do

[R] classification with nnet: handling unequal class sizes

2004-03-30 Thread Christoph Lehmann

I hope this question is adequate for this list

I use the nnet code from VR p. 348: The very nice and general function
CVnn2() to choose the number of hidden units and the amount of weight
decay by an inner cross-validation- with a slight modification to use it
for classification (see below).

My data has 2 classes with unequal size: 45 observations for classI and
116 obs. for classII

With CVnn2 I get the following confusion matrix (%) (average of 10
runs):

predicted
true53 47
16 84

I had a similar biased confusion matrix with randomForest until I used
the sampsize argument (the same holds for svm until I used the
class.weights argument).

How can I handle this problem of unequal class sizes with nnet, in order
to get a less biased confusion matrix?

(with randomForest I finally got 

78 22
16 84
)


many thanks for a hint

Christoph



#--- neural networks

#classification network is constructed; this has one output and entropy
fit if the number of levels is two, and a number of outputs equal to the
number of classes and a softmax output stage for more levels. -
therefore two lines of Prof. Ripley's wrapper function are changed below
(original commented out) and an additional function has been introduced
(resmatrix)

con - function(...)
{
print(tab - table(...))
diag(tab) - 0
cat(error rate = ,
round(100*sum(tab)/length(list(...)[[1]]), 2), %\n)
invisible()
}


CVnn2 - function(formula, data,
  size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001,
0.01),2)),
  nreps = 1, nifold = 5, verbose = 99, ...)
{
resmatrix - function(predict.matrix,learn, data, ri, i)
{
   rae.matrix -   predict.matrix
   rae.matrix[,] - 0
   rae.vector - as.numeric(as.factor((predict(learn, data[ri ==
i,], type = class
   for (k in 1:dim(rae.matrix)[1]) {
 if (rae.vector[k] == 1) rae.matrix[k,1] - rae.matrix[k,1] + 1
else
 rae.matrix[k,2] - rae.matrix[k,2] + 1
   }
   rae.matrix
}


CVnn1 - function(formula, data, nreps=1, ri, verbose,  ...)
{
totalerror - 0
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
if(verbose  20) cat(  inner fold)
for (i in sort(unique(ri))) {
if(verbose  20) cat( , i,  sep=)
for(rep in 1:nreps) {
learn - nnet(formula, data[ri !=i,], trace = F, ...)
#res[ri == i,] - res[ri == i,] + predict(learn, data[ri
== i,])
res[ri == i,] - res[ri == i,] + resmatrix(res[ri ==
i,],learn,data, ri, i)
}
}
if(verbose  20) cat(\n)
sum(as.numeric(truth) != max.col(res/nreps))
}
truth - data[,deparse(formula[[2]])]
res -  matrix(0, nrow(data), length(levels(truth)))
choice - numeric(length(lambda))
for (i in sort(unique(rand))) {
if(verbose  0) cat(fold , i,\n, sep=)
ri - sample(nifold, sum(rand!=i), replace=T)
for(j in seq(along=lambda)) {
if(verbose  10)
cat(  size =, size[j], decay =, lambda[j], \n)
choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps,
   ri=ri, size=size[j], decay=lambda[j],
   verbose=verbose, ...)
}
decay - lambda[which.is.max(-choice)]
csize - size[which.is.max(-choice)]
if(verbose  5) cat(  #errors:, choice,   ) #
if(verbose  1) cat(chosen size = , csize,
 decay = , decay, \n, sep=)
for(rep in 1:nreps) {
learn - nnet(formula, data[rand != i,], trace=F,
  size=csize, decay=decay, ...)
#res[rand == i,] - res[rand == i,] + predict(learn,
data[rand == i,])
res[rand == i,] - res[rand == i,] + resmatrix(res[rand ==
i,],learn,data, rand, i)
}
}
factor(levels(truth)[max.col(res/nreps)], levels = levels(truth))
}



-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] pca scores for newdata

2004-03-01 Thread Christoph Lehmann

Hi
I used princomp on a dataset x[!sub,]. How can I get the scores for
another dataset, say x[sub,]? I didn't succeed using predict()

thanks for a hint

cheers

christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] PLS scores for newdata

2004-03-01 Thread Christoph Lehmann

Hi

I used pls.pcr on a dataset x[!sub,]. How can I get the scores for
another dataset, say x[sub,]? I couldn't reproduce the scores for the
training set by multiplying the data with the loadings, even after first
scaling the data (scale()). pls.pcr only gives the XScores for the
training data.

many thanks for a hint

cheers

christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] my own function given to lapply

2004-02-26 Thread Christoph Lehmann


Hi 

It seems, I just miss something. I defined 

treshold - function(pred) {
if (pred  0.5) pred - 0 else pred - 1
return(pred)
}

and want to use apply it on a vector

sapply(mylist[,,3],threshold)

but I get:

Error in match.fun(FUN) : Object threshold not found

thanks for help
cheers

chris




-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] nlme vs aov with Error() for an ANCOVA

2004-01-15 Thread Christoph Lehmann

Hi 
I compouted a multiple linear regression with repeated measures on one
explanatory variable:
BOLD peak (blood oxygenation) as dependent variable,

and as independent variables I have:
-age.group (binaray:young(0)/old(1)) 
-and task-difficulty measured by means of the reaction-time 'rt'. For
'rt' I have repeated measurements, since each subject did 12 different
tasks.
- so it can be seen as an ANCOVA

subject  age.group boldrt

subj10 0.080.234   
subj10 0.050.124 
..  
subj10 0.070.743  

subj20 0.060.234 
subj20 0.020.183 
..
subj20 0.050.532 
 
subjn1 0.090.234
subjn1 0.060.155
..
subjn1 0.070.632  

I decided to use the nlme library:

patrizia.lme - lme(bold ~ rt*age.group, data=patrizia.data1, random= ~
rt |subject)
 summary(patrizia.lme)
Linear mixed-effects model fit by REML
 Data: patrizia.data1
   AIC  BIClogLik
  272.2949 308.3650 -128.1474
 
Random effects:
 Formula: ~rt | subject
 Structure: General positive-definite, Log-Cholesky parametrization
StdDev   Corr
(Intercept) 0.2740019518 (Intr)
rt  0.0004756026 -0.762
Residual0.2450787149
 
Fixed effects: bold ~ rt + age.group + rt:age.group
   Value  Std.Error  DF   t-value p-value
(Intercept)   0.06109373 0.11725208 628  0.521046  0.6025
rt0.00110117 0.00015732 628  6.999501  0.
age.group-0.03750787 0.13732793  43 -0.273126  0.7861
rt:age.group -0.00031919 0.00018259 628 -1.748115  0.0809
 Correlation:
 (Intr) rt ag.grp
rt   -0.818
age.group-0.854  0.698
rt:age.group  0.705 -0.862 -0.805
 
Standardized Within-Group Residuals:
   Min Q1Med Q3Max
-3.6110596 -0.5982741 -0.0408144  0.5617381  4.8648242
 
Number of Observations: 675
Number of Groups: 45

--end output
#- if the model assumptions hold this means, we don't have a
significant age effect but a highly significant task-effect and the
interaction is significant on the 0.1 niveau. 

I am now interested, if one could do the analysis also using aov and the
Error() option.

e.g. may I do:
 l - aov(bold ~ rt*age.group + Error(subject/rt),data=patrizia.data1)
 summary(l)
 
Error: subject
   DfSum Sq   Mean Sq
rt  1 0.0022087 0.0022087
 
Error: subject:rt
   Df Sum Sq Mean Sq
rt  1 40.706  40.706
 
Error: Within
  Df  Sum Sq Mean Sq F valuePr(F)
rt 1   2.422   2.422 10.0508  0.001592 **
age.group  1   8.722   8.722 36.2022 2.929e-09 ***
rt:age.group   1   0.277   0.277  1.1494  0.284060
Residuals669 161.187   0.241
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1


which looks weird

or what would you recommend?

thanks a lot

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] array(list(),c(2,5)) gives error in R 1.8.1

2004-01-14 Thread Christoph Lehmann

Hi

In R 1.7 the following worked fine:

 array(list(),c(2,5))
 [,1] [,2] [,3] [,4] [,5]
[1,] NULL NULL NULL NULL NULL
[2,] NULL NULL NULL NULL NULL

now in R 1.8.1 I get the error:

Error in rep.int(data, t1) : invalid number of copies in rep
In addition: Warning message:
NAs introduced by coercion

thanks for help, I need this possibility for storing objects (lm
results) in an array

cheers

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] comparing classification methods: 10-fold cv or leaving-one-out ?

2004-01-06 Thread Christoph Lehmann

Hi
what would you recommend to compare classification methods such as LDA,
classification trees (rpart), bagging, SVM, etc:

10-fold cv (as in Ripley p. 346f)

or

leaving-one-out (as e.g. implemented in LDA)?

my data-set is not that huge (roughly 200 entries)

many thanks for a hint

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lda() called with data=subset() command

2004-01-05 Thread Christoph Lehmann

Hi
I have a data.frame with a grouping variable having the levels 

C, 
mild AD, 
mod AD, 
O and 
S

since I want to compute a lda only for the two groups 'C' and 'mod AD' I
call lda with data=subset(mydata.pca,GROUP == 'mod AD' | GROUP == 'C')


my.lda - lda(GROUP ~ Comp.1 + Comp.2 + Comp.3 + Comp.4+  Comp.5 +
Comp.6 + Comp.7 + Comp.8  , data=subset(mydata.pca,GROUP == 'mod AD' |
GROUP == 'C'), CV = TRUE)

this results in the warning group(s) mild AD O S are empty in:
lda.default(x, grouping, ...) of course...

my.lda$class now shows 

 [1] C   C   C   C   C   C   C   C   C
[10] C   C   C   C   C   C   C   C   C
[19] C   C   C   mild AD mild AD mild AD mild AD mild AD
mild AD
[28] mild AD C   mild AD mild AD mild AD C   C   mild AD
mild AD
[37] mild AD mild AD
Levels: C mild AD mod AD O S

it seems it just took the second level (mild AD) for the second class,
even though the second level was not used for the lda computation (only
the first level (C) and the third level (mod AD)

what shall I do to resolve this (little) problem?

thanks for a  hint

christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] lmList error if NA in variable not used

2003-12-19 Thread Christoph Lehmann

I try to fit a lmList model. If in the used dataframe a variable (a
column) has some NA the lmList gives an error, even this variable is not
used in the model. why? or what is my mistake?

Error in na.fail.default(data) : missing values in object

thanks 

cheers

christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] partial proportional odds model (PPO)

2003-12-13 Thread Christoph Lehmann

Hi

Since the 'equal slope' assumption doesn't hold in my data I cannot use
a proportional odds model ('Design' library, together with 'Hmisc'). I
would like to try therefore a partial proportional odds model

Please, could anybody tell me, where to find the code and how to specify
such a model

..or any potential alternatives

many thanks for your kind help

christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] R^2 analogue in polr() and prerequisites for polr()

2003-12-08 Thread Christoph Lehmann

Hi

(1)In polr(), is there any way to calculate a pseudo analogue to the
R^2. Just for use as a purely descriptive statistic of the goodness of
fit?

(2) And: what are the assumptions which must be fulfilled, so that the
results of polr() (t-values, etc.) are valid? How can I test these
prerequisites most easily: I have a three-level (ordered factor)
response and four metric variables.

many thanks

Christoph

-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] R^2 analogue in polr() and prerequisites for polr()

2003-12-08 Thread Christoph Lehmann

many thanks! I was just asking for a r-square analogue, since the
students I will present the results to, might like to know, how the
measure of fit in an ordinal regression (e.g. the residual deviance)
compare to measures they know (from introductory courses to linear
regression) (such as the r-square), means: how much of the variance of
the dependent variable can be explained by the variance of the
independent variables.

thanks and best regards

christoph
On Tue, 2003-12-09 at 07:26, Prof Brian Ripley wrote:
 On 8 Dec 2003, Christoph Lehmann wrote:
 
  (1)In polr(), is there any way to calculate a pseudo analogue to the
  R^2. Just for use as a purely descriptive statistic of the goodness of
  fit?
 
 First define the statistic you are interested in!  There is an absolute 
 measure of fit, the residual deviance.
 
  (2) And: what are the assumptions which must be fulfilled, so that the
  results of polr() (t-values, etc.) are valid? How can I test these
  prerequisites most easily: I have a three-level (ordered factor)
  response and four metric variables.
 
 This is discussed with worked examples in the book that MASS supports, so 
 please consult your copy.
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] criterion for variable selection in LDA

2003-11-10 Thread Christoph Lehmann

Hi
Since a stepwise procedure for variable selection (as e.g. in SPSS) for
a LDA is not implemented in R and anyway I cannot be sure, that all the
required assumptions for e.g. a procedure using a statistic based on
wilks' lambda, hold (such as normality and variance homogeneity) I would
like to ask you, what you would recommend me:

shall I e.g. define a criterion such as the error-rate stemming from a
leaving-one-out cross-validation and then write my own procedure of
including/removing variables?

or what would be the golden standard for such a case (my case is that
I have 2 groups (n1=30, n2=15, number of potential variables: 37, no
equal variance in the two groups))

many thanks

cheers

christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] using split.screen() in Sweave

2003-10-08 Thread Christoph Lehmann

Dear R and sweave users

A further problem, which I couldn't resolve, using the manual: In R I
use the split.screen command to put e.g. two timecourses one above the
other into one plot:

split.screen(c(2,1))
screen(1)
plot(stick,type='h', col=red,lwd=2)
screen(2)
plot(deconvolution.amplitude,type='h',col=blue,lwd=2)

Is there a similar way, doing this in Sweave?

many thanks

Christoph

-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] plot width in Sweave

2003-10-07 Thread Christoph Lehmann

Hi

I didn't find this in the manual: I need to change the width of a plot
while I use sweave, so which command/parameters should I insert below,
to change the width of a plot

\begin{figure}[htbp]
  \begin{center}
echo=TRUE, fig=TRUE=
plot(Re(q),ylab =,type=o,col=blue,lwd=1, sub=mystring)
@ 
\caption{Original stick function (stimulus train)}
  \end{center}
\end{figure}

many thanks

christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] conversion of some lines of matlab code to R

2003-10-06 Thread Christoph Lehmann

Dear R-users

I got some matlab code (just a few lines) for the deconvolution of a
signal:

function q=Dconvolve(funct)
w=squeeze(size(funct)); % gets the length
t=0:3:(w(1,1)-1)*3;
h=Dehemo(t,1.25,3);
r=fft(h);
rinv = 1./r;
q = real(ifft( fft(funct(:)).*rinv(:)));
%plot(t/0.75,q/max(q),'r-',t/0.75,funct/max(funct),'g-')

function h=DeHemo(t,tau,N)
h=(t/tau).^(N-1).*exp(-t/tau)./(tau*factorial(N-1));

since I don't know matlab: is there any one who could tell me, how these
lines would look like in R, means, how I could do this deconvolution in
R?

Many thanks

Cheers

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] convolution question

2003-10-06 Thread Christoph Lehmann

Dear R-users
I have a question about convolution, using the convolve() command:

the following code gives a function, which will be convolved with a
train of delta functions. Can anybody tell me, why the convolved
function doesn't have the same length as the original train of delta
functions?

tau - 1.25
N - 3
t - seq(0,15,by=3)
h - (t/tau)^(N-1)*exp(-t/tau)/(tau*prod(N-1))
plot(h,type='o') 
stick - rep(0,100)
ones - round(runif(66)*99)+1
stick[ones] -1
stick

X11() #open another graphics device
plot(stick)

convolution - convolve(stick,h,conj=FALSE,type='filter')
X11() #open another graphics device
plot(convolution,type='o')




many thanks!
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] using identify() together with plot () and pixmap()

2003-09-30 Thread Christoph Lehmann

Dear R users
I have a two-dimensional array, whose values I want to plot, using the
pixmapGrey class. Plotting works fine, and now I would like to be able
to identify some of the points in the plot using identify(). But I get
the following message while pressing the left mouse button:

 plot(pixmapGrey(fmri.vtc[,,slice,volume]))
 identify(fmri.vtc[,,slice,volume])
warning: no point with 0.25 inches


pressing the right mouse button I get:
numeric(0)

what is the problem  here and how can I solve it?

many thanks for your help

Cheers! 

Christoph


-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] overlay two pixmap

2003-09-30 Thread Christoph Lehmann

when I try to overlay a completely transparent pixmap on another pixmap,
I get an error.

For reproduction: just the transparent pixmap itself gives an error:

tmp - array(0,c(x.dim,y.dim))
tmp - pixmapIndexed(tmp[,])
for (x in 1:x.dim) {
for (y in 1:y.dim) {
[EMAIL PROTECTED],y] - NA 
} }
plot(tmp)

 Error:

Error in image.default(x = X, y = Y, z = t([EMAIL PROTECTED]([EMAIL PROTECTED]):1, ]), 
:
invalid z limits
In addition: Warning messages:
1: no finite arguments to min; returning Inf
2: no finite arguments to max; returning -Inf


what happened here? 

many thanks

christoph


On Fri, 2003-09-26 at 12:44, Roger Bivand wrote:
 Christoph Lehmann wrote:
 
  I need to overlay two pixmaps (library (pixmap)). One, a pixmapGrey, is
  the basis, and on this I need to overlay a pixmapIndexed, BUT: the
  pixmapIndexed has set only some of its pixels to an indexed color,
  many of its pixels should not cover the basis pixmapGrey pixel, means,
  for this in pixmapIndexed not defined pixels it should be transparent.
 
  What would you recommend me to do? Should I go for another solution than
  pixmap?
 
 Determine which of the indexed colours in the pixmapIndexed object are to 
 be transparent, and change them to NA - you access them in say:
 
 library(pixmap)
 x - read.pnm(system.file(pictures/logo.ppm, package = pixmap)[1])
 x
 plot(x)
 xx - as(x, pixmapIndexed)
 xx
 plot(xx)
 example(pixmap)
 z - pixmapRGB(c(z1, z2, z3), 100, 100, bbox = c(0, 0, 100, 100))
 plot(z)
 [EMAIL PROTECTED]:20]
 [EMAIL PROTECTED] - NA
 plot(xx, add=T)
 
 Here [EMAIL PROTECTED] was white aka #FF, you could use col2rgb() to help 
 set thresholds. Admittedly, this is messy if you don't know your 
 threshold.
 
 Roger
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] overlay two pixmap

2003-09-26 Thread Christoph Lehmann

Hi

I need to overlay two pixmaps (library (pixmap)). One, a pixmapGrey, is
the basis, and on this I need to overlay a pixmapIndexed, BUT: the
pixmapIndexed has set only some of its pixels to an indexed color,
many of its pixels should not cover the basis pixmapGrey pixel, means,
for this in pixmapIndexed not defined pixels it should be transparent.

What would you recommend me to do? Should I go for another solution than
pixmap?

Many thanks

Cheers

Christoph

-- 
Christoph Lehmann [EMAIL PROTECTED]
University Hospital of Clinical Psychiatry
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Error from gls call (package nlme)

2003-09-25 Thread Christoph Lehmann

Hi
I have a huge array with series of data. For each cell in the array I
fit a linear model, either using lm() or gls()

with lm() there is no problem, but with gls() I get an error:

Error in glsEstimate(glsSt, control = glsEstControl) :
computed gls fit is singular, rank 2

as soon as there are data like this:
 y1 - c(0,0,0,0)
 x1 - c(0,1,1.3,0)
 gls(y1~x1)
Error in glsEstimate(glsSt, control = glsEstControl) :
computed gls fit is singular, rank 2

of course, this is not a problem for lm()

 lm(y1~x1)
 
Call:
lm(formula = y1 ~ x1)
 
Coefficients:
(Intercept)   x1
  00

I know, that such data does not make sense but it is possible, that
something like this occurs in my data-set. Since I call gls() for every
cell of my array in a loop, I don't want such errors to occur, since
this breaks my loop.

what is the problem here? What are potential solutions?

Many thanks

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] storing objects (lm results) in an array

2003-09-24 Thread Christoph Lehmann

Hi

I have calculated lots (1000) of linear models and would like to store
each single result as an element of a 3D matrix or a similar storage:
something like

glm[i][j][k] = lm(...)

Since I read that results are lists: Is it possible to define a matrix
of type list? 

Or what do you recommend?

Many thanks

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] anybody running Rggobi on a redhat 9.0 system?

2003-09-22 Thread Christoph Lehmann

Hi
my installation of ggobi (!) was successful, but when I try to install
Rggobi as described on http://www.ggobi.org/INSTALL.html:

as non-su:
R_HOME=/usr/lib/R
export R_HOME
GGOBI_ROOT=/usr/local/src/ggobi
export GGOBI_ROOT
R_LIBS=/usr/lib/R/library
export R_LIBS

as: su
ln -s $GGOBI_ROOT/lib/libggobi.so /usr/lib/.
ln -s $GGOBI_ROOT/lib/libgtkext.so /usr/lib/.
R CMD INSTALL Rggobi_0.53-0.tar.gz

I get:
** R
** inst
** save image
Error in class-(*tmp*, value = Class) :
couldn't find function objWithClass
Warning message:
package methods in options(defaultPackages) was not found
Error in class-(*tmp*, value = Class) :
couldn't find function objWithClass
Error in library(methods) : .First.lib failed
Execution halted
/usr/local/lib/R/bin/INSTALL: line 1: 14240 Broken pipe cat
${R_PACKAGE_DIR}/R/${pkg}
ERROR: execution of package source for 'Rggobi' failed


many thanks for your help
Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] PLS LDA

2003-09-10 Thread Christoph Lehmann

Dear R experts
I saw and downloaded the fresh pls package for R. Is there any way of
using this pls package for PLS discriminant analysis? If not, is there
any other package available.

I need a way of classifying objects into e.g. two groups, where
nbr_observations  nbr_variables

many thanks for your kind help

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] logistic regression for a data set with perfect separation

2003-09-10 Thread Christoph Lehmann

Dear R experts

I have the follwoing data
  V1 V2
1 -5.800  0
2 -4.800  0
3 -2.867  0
4 -0.867  0
5 -0.733  0
6 -1.667  0
7 -0.133  1
8  1.200  1
9  1.333  1

and I want to know, whether V1 can predict V2: of course it can, since
there is a perfect separation between cases 1..6 and 7..9

How can I test, whether this conclusion (being able to assign an
observation i to class j, only knowing its value on Variable V1)  holds
also for the population, our data were drawn from? 

Means, which inference procedure is recommended? Logistic regression,
for obvious reasons makes no sense.

Many thanks for your help

Christoph
-- 
Christoph Lehmann [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

1 2 >

1 - 100 of 120 matches

Mail list logo